Abstract
Fine particulate matter (PM2.5) comprises diverse chemical components, including elemental carbon (EC), silicon (SI), sulfate (SO4), and calcium (CA), each linked to varied health and environmental impacts. Accurately estimating these components' spatial and temporal distributions is crucial for regulatory policies and public health. This study developed and evaluated multivariate machine learning models, including Random Forest (RF) and XGBoost (XGB), to estimate daily concentrations of EC, SI, SO4, and CA across the contiguous United States from 2000 to 2019. Unlike traditional univariate approaches, multivariate models capture interdependencies among components, improving accuracy and efficiency. Using data from 534 monitoring sites and 187 predictor variables derived from satellite observations, reanalysis datasets, and geographical sources, we implemented univariate and multivariate RF and XGB models (MRF and MXGBoost). Performance was assessed using R-squared metrics, and feature importance was evaluated with SHAP values. MXGBoost outperformed other models, achieving R2 values of 70.2 % for EC, 79.23 % for SO4, 61.57 % for SI, and 59.5 % for CA, with spatial R2 exceeding 93 % and temporal R2 as high as 82.23 % for SO4. Key predictors included wind speed, relative humidity, and aerosol optical depth. The findings highlight the advantages of multivariate modeling in capturing the interdependencies among PM2.5 components, resulting in improved estimation accuracy and computational efficiency. This approach offers valuable applications in air quality management and public health, emphasizing the need to refine multivariate frameworks and explore their applicability to other pollutants.
Original language | American English |
---|---|
Article number | 126161 |
Journal | Environmental Pollution |
Volume | 374 |
DOIs | |
State | Published - 1 Jun 2025 |
Externally published | Yes |
Keywords
- Elemental carbon
- Multivariate machine learning
- PM components
- Random forest
- Silicon
- Sulfate
- XGBoost
All Science Journal Classification (ASJC) codes
- Toxicology
- Pollution
- Health, Toxicology and Mutagenesis