Precise and efficient ozone (O3) concentration prediction is crucial for weather monitoring and environmental policymaking due to the harmful effects of high O3 pollution levels on human health and ecosystems. However, the complexity of O3 formation mechanisms in the troposphere presents a significant challenge in modeling O3 accurately and quickly, especially in the absence of a process model. Data-driven machine-learning techniques have demonstrated promising performance in modeling air pollution, mainly when a process model is unavailable. This study evaluates the predictive performance of nineteen machine learning models for ozone pollution prediction. Specifically, we assess how incorporating features using Random Forest affects O3 concentration prediction and investigate using time-lagged measurements to improve prediction accuracy. Air pollution and meteorological data collected at King Abdullah University of Science and Technology are used. Results show that dynamic models using time-lagged data outperform static and reduced machine learning models. Incorporating time-lagged data improves the accuracy of machine learning models by 300% and 200%, respectively, compared to static and reduced models, under RMSE metrics. And importantly, the best dynamic model with time-lagged information only requires 0.01 s, indicating its practical use. The Diebold-Mariano Test, a statistical test used to compare the forecasting accuracy of models, is also conducted.
Bibliographical noteKAUST Repository Item: Exported on 2023-05-18
Acknowledged KAUST grant number(s): ORA-2022-5339
Acknowledgements: This publication is based upon work supported by King Abdullah University of Science and Technology (KAUST) Research Funding (KRF) from the Climate and Livability Initiative (CLI) under Award No. ORA-2022-5339. The authors would like to express their gratitude towards Health, Safety and Environment (HSE) Department at KAUST, for providing the data used in this study.