Predicting ground-level PM2.5 using high-resolution satellite imagery and machine learning algorithms with ground-based validation
Publication
, Journal Article
Tariq, MH; Hussain, M; Khaliq, MM; Saeed, T; Bergin, MH; Khokhar, MF
Published in: International Journal of Remote Sensing
The proposed study explores the prediction of ground truth PM2.5 ug/m3 concentration using satellite imagery via a combined deep learning approach followed by a machine learning model. We employ a convolutional neural network (CNN) based on a modified version of the state-of-the-art VGG16 model for the sake of extracting deep features from satellite images. The output from the CNN are then concatenated with additional meteorological features including temperature and relative humidity, and the seasonal factors Month, Day, Year, and then fed into a Random Forest regression model, which is responsible for the prediction of PM2.5 ug/m3 concentration. The proposed methodology is further tested on different sites of Rawalpindi and Islamabad, Pakistan, including Road sites and Non-Road sites (residential). Various experiments were performed in order to test the robustness of the presented machine learning pipeline, in the first experiment, the model is trained and tested on a shuffled dataset of all the 30 sites, achieving a minimum Mean Absolute Error (MAE) of 11.27 µg/m3, a Root Mean Square Error (RMSE) of 16.07 µg/m3, and Pearson Correlation of 0.85 between the actual and the predicted PM2.5 ug/m3 concentration. The second experiment involves training separate models for each site type to evaluate their performance on unseen data, achieving a minimum RMSE of 16.08 ug/m3, MAEs of 12.64 ug/m3, and Pearson Correlations of 0.84, respectively. The results demonstrate the effectiveness of the proposed methodology in accurately predicting PM2.5 ug/m3 concentrations and highlight the potential for model improvement by targeting site-specific characteristics.