Loading...

Estimating ground-level PM2.5 concentrations by developing and optimizing machine learning and statistical models using 3 km MODIS AODs: case study of Tehran, Iran

Sotoudeheian, S ; Sharif University of Technology | 2021

184 Viewed
  1. Type of Document: Article
  2. DOI: 10.1007/s40201-020-00509-5
  3. Publisher: Springer Science and Business Media Deutschland GmbH , 2021
  4. Abstract:
  5. Purpose: In this study we aimed to develop an optimized prediction model to estimate a fine-resolution grid of ground-level PM2.5 levels over Tehran. Using remote sensing data to obtain fine-resolution grids of particulate levels in highly polluted environments in areas such as Middle East with the abundance of brightly reflecting deserts is challenging. Methods: Different prediction models implementing 3 km AOD products from the MODIS collection 6 and various effective parameters were used to obtain a reliable model to estimate ground-level PM2.5 concentrations. In this regards, the linear mixed effect model (LME), multi-variable linear regression model (MLR), Gaussian process model (GPM), artificial neural network (ANN) and support vector regression (SVR) were developed and their performance were compared. Since the LME and GPM outperformed other models, they were further optimized based on meteorological and topographical variables. These models were used to estimate PM2.5 values over the highly polluted megacity, Tehran, Iran. Moreover, the influence of site effect term on the performance of different shapes of LME models was evaluated by considering the random intercept for sites. Results: Results showed LME models without the site effect term were able to explain ground-level variabilities of PM2.5 concentrations in ranges of 60–66% (RMSE = 9.6 to 10.3 μg/m3) and 35–41% (RMSE = 12.7 to 13.3 μg/m3) during the model-fitting and cross-validation, respectively. By considering the site effect term, the performance of LME models during calibrations and validations improved by 20% and 50% on average, respectively (18.5% and 17% decrease in the RSME) as compared to LME models without the site effect term. The optimized shape of LME models had a good agreement during both model-fitting (R2 of 0.76) and cross-validation (R2 of 0.6). Site-specific and seasonal performances of all types of models revealed that LME models had highest R2 values over all monitoring stations and all seasons during the cross-validation. LME models had the best performance in May and March compared to other months during the model-fitting and cross-validation. However, LME models had a significant weakness in predicting extreme values of PM2.5 during the cross-validation. Among all other types of models, GPM with the R2 value of 0.59 and the RMSE of 10.2 μg/m3 had the best performance during the cross-validation. Conclusions: While the best shape of LME and GPM had similar and reliable performances in predicting ground-level PM2.5 values during the cross-validation, GPM was able to predict extreme values of ground-level PM2.5 concentrations, which was the weakness of LME models and was an important issue in urban polluted environments. In this respect, GPM could be a good alternative for LME models for high levels of PM2.5 concentrations. The spatial distribution of estimated PM2.5 values represented that central parts of Tehran were the most polluted area over the studied region which was consistent with the ground-level recording PM2.5 data over monitoring stations. © 2021, Springer Nature Switzerland AG
  6. Keywords:
  7. Aerosol ; Concentration (composition) ; Estimation method ; Machine learning ; MODIS ; Optical depth ; Optimization ; Particulate matter ; Prediction ; Remote sensing ; Statistical analysis ; Iran ; Tehran [Iran]
  8. Source: Journal of Environmental Health Science and Engineering ; Volume 19, Issue 1 , 2021 ; 2052336X (ISSN)
  9. URL: https://pubmed.ncbi.nlm.nih.gov/34150215