An RF-CNN pipeline for predicting PM2.5 concentration in Sri Lanka
Publication
, Journal Article
Attanayake, G; Senarathna, M; Bergin, M; Carlson, D; Bhave, PV; Bowatte, G; Harischandra, N
Published in: Journal of Hazardous Materials Advances
Air pollution is a considerable global public health threat, requiring efficient monitoring and forecasting to guide decision-making. This study introduces a cascaded model of enhanced Random Forest with Convolutional Neural Network (RF-CNN) that predicts spatiotemporal fluctuations in PM2.5 concentrations throughout Sri Lanka. The K-Nearest Neighbors method is employed to impute missing data, and the model utilizes data from 24 low-cost PM2.5 sensors that are distributed throughout the country. The Convolutional Neural Network (CNN) derives spatial features from four-band PlanetScope satellite images (3m/pixel resolution, 1km2 spatial coverage), while the Random Forest (RF) component models the relationship between PM2.5 levels and four meteorological parameters. These features, combined with meteorological, spatial, and temporal inputs, produce the final forecasting results. The dataset comprises 1934 satellite images that were collected between December 2022 and February 2024, with an average PM2.5 concentration of approximately 15 μg/m3. The RF-CNN model exhibited robust performance metrics across a variety of climate zones, including a normalized root mean square error of approximately 32.4 %, a mean absolute percentage error of approximately 25.7 %, a normalized mean absolute error of approximately 22.8 %, a Spearman r of 0.871, and a Pearson r of 0.873. Two metrics: Input Data Quality Score (IDQS) and Testing Data Quality Score (TDQS) were implemented to evaluate the effects of imputation. Performance was minimally impacted by imputation within acceptable ranges, while exceeding limits resulted in increased uncertainty. This research emphasizes the efficacy of the RF-CNN approach, which integrates satellite imagery and low-cost sensor data, as a scalable solution for predicting spatiotemporal PM2.5 variations. It provides valuable insights for regions that lack extensive monitoring.