PM<sub>2.5</sub> Prediction Based on Random Forest, XGBoost, and Deep Learning Using Multisource Remote Sensing Data

oleh: Mehdi Zamani Joharestani, Chunxiang Cao, Xiliang Ni, Barjeece Bashir, Somayeh Talebiesfandarani

Format: Article
Diterbitkan: MDPI AG 2019-07-01

Deskripsi

In recent years, air pollution has become an important public health concern. The high concentration of fine particulate matter with diameter less than 2.5 &#181;m (PM<sub>2.5</sub>) is known to be associated with lung cancer, cardiovascular disease, respiratory disease, and metabolic disease. Predicting PM<sub>2.5</sub> concentrations can help governments warn people at high risk, thus mitigating the complications. Although attempts have been made to predict PM<sub>2.5</sub> concentrations, the factors influencing PM<sub>2.5</sub> prediction have not been investigated. In this work, we study feature importance for PM<sub>2.5</sub> prediction in Tehran&#8217;s urban area, implementing random forest, extreme gradient boosting, and deep learning machine learning (ML) approaches. We use 23 features, including satellite and meteorological data, ground-measured PM<sub>2.5</sub>, and geographical data, in the modeling. The best model performance obtained was R<sup>2</sup> = 0.81 (R = 0.9), MAE = 9.93 &#181;g/m<sup>3</sup>, and RMSE = 13.58 &#181;g/m<sup>3</sup> using the XGBoost approach, incorporating elimination of unimportant features. However, all three ML methods performed similarly and R<sup>2</sup> varied from 0.63 to 0.67, when Aerosol Optical Depth (AOD) at 3 km resolution was included, and 0.77 to 0.81, when AOD at 3 km resolution was excluded. Contrary to the PM<sub>2.5</sub> lag data, satellite-derived AODs did not improve model performance.