PM<sub>2.5</sub> Concentration Forecasting Using Weighted Bi-LSTM and Random Forest Feature Importance-Based Feature Selection

oleh: Baekcheon Kim, Eunkyeong Kim, Seunghwan Jung, Minseok Kim, Jinyong Kim, Sungshin Kim

Format: Article
Diterbitkan: MDPI AG 2023-06-01

Deskripsi

Particulate matter (PM) in the air can cause various health problems and diseases in humans. In particular, the smaller size of PM<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mrow></mrow><mrow><mn>2.5</mn></mrow></msub></semantics></math></inline-formula> enable them to penetrate deep into the lungs, causing severe health impacts. Exposure to PM<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mrow></mrow><mrow><mn>2.5</mn></mrow></msub></semantics></math></inline-formula> can result in respiratory, cardiovascular, and allergic diseases, and prolonged exposure has also been linked to an increased risk of cancer, including lung cancer. Therefore, forecasting the PM<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mrow></mrow><mrow><mn>2.5</mn></mrow></msub></semantics></math></inline-formula> concentration in the surrounding is crucial for preventing these adverse health effects. This paper proposes a method for forecasting the PM<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mrow></mrow><mrow><mn>2.5</mn></mrow></msub></semantics></math></inline-formula> concentration after 1 h using bidirectional long short-term memory (Bi-LSTM). The proposed method involves selecting input variables based on the feature importance calculated by random forest, classifying the data to assign weight variables to reduce bias, and forecasting the PM<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mrow></mrow><mrow><mn>2.5</mn></mrow></msub></semantics></math></inline-formula> concentration using Bi-LSTM. To compare the performance of the proposed method, two case studies were conducted. First, a comparison of forecasting performance according to preprocessing. Second, forecasting performance between deep learning (long short-term memory, gated recurrent unit, and Bi-LSTM) and conventional machine learning models (multi-layer perceptron, support vector machine, decision tree, and random forest). In case study 1, The proposed method shows that the performance indices (RMSE: 3.98%p, MAE: 5.87%p, RRMSE: 3.96%p, and R<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msup><mrow></mrow><mn>2</mn></msup></semantics></math></inline-formula>:0.72%p) are improved because weights are given according to the input variables before the forecasting is performed. In case study 2, we show that Bi-LSTM, which considers both directions (forward and backward), can effectively forecast when compared to conventional models (RMSE: 2.70, MAE: 0.84, RRMSE: 1.97, R<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msup><mrow></mrow><mn>2</mn></msup></semantics></math></inline-formula>: 0.16). Therefore, it is shown that the proposed method can effectively forecast PM<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><msub><mrow></mrow><mrow><mn>2.5</mn></mrow></msub></semantics></math></inline-formula> even if the data in the high-concentration section is insufficient.