Machine learning and deep learning-driven methods for predicting ambient particulate matters levels: A case study

Amin Wu, Fouzi Harrou*, Abdelkader Dairi, Ying Sun

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

11 Scopus citations

Abstract

Dust, or particulate matter (PM2.5), is among the most harmful pollutants negatively affecting human health. Predicting indoor PM2.5 concentrations is essential to achieve acceptable indoor air quality. This study aims to investigate data-driven models to accurately predict PM 2.5 pollution. Notably, a comparative study has been conducted between twenty-one machine learning and deep learning models to predict PM2.5 levels. Specifically, we investigate the performance of machine learning and deep learning models to predict ambient PM2.5 concentrations based on other ambient pollutants, including SO (Formula presented.), NO (Formula presented.), O (Formula presented.), CO, and PM10. Here, we applied Bayesian optimization to optimally tune hyperparameters of the Gaussian process regression with different kernels and ensemble learning models (i.e., boosted trees and bagged trees) and investigated their prediction performance. Furthermore, to further enhance the forecasting performance of the investigated models, dynamic information has been incorporated by introducing lagged measurements in the construction of the considered models. Results show a significant improvement in the prediction performance when considering dynamic information from past data. Moreover, three methods, namely, random forest (RF), decision tree, and extreme gradient boosting, are applied to assess variables contribution and revealed that lagged PM2.5 data contribute significantly to the prediction performance and enables the construction of parsimonious models. Hourly concentration levels of ambient air pollution from the air quality monitoring network located in Seoul are employed to verify the prediction effectiveness of the studied models. Six measurements of effectiveness are used for assessing the prediction quality. Results showed that deep learning models are more efficient than the other investigated machine learning models (i.e., SVR, GPR, bagged and boosted trees, RF, and XGBoost). Also, the results showed that the bidirectional long short term memory (BiLSTM) and bidirectional gated recurrent units (BiGRU) networks produce higher performance than the investigated machine learning models (i.e., SVR, GPR, bagged and boosted trees, RF, and XGBoost) and deep learning models (i.e., LSTM, GRU, and convolutional neural network).

Original languageEnglish (US)
Article numbere7035
JournalConcurrency and Computation: Practice and Experience
Volume34
Issue number19
DOIs
StatePublished - Aug 30 2022

Bibliographical note

Funding Information:
information King Abdullah University of Science and Technology, OSR-2019-CRG7-3800

Publisher Copyright:
© 2022 John Wiley & Sons, Ltd.

Keywords

  • air pollution
  • data-driven
  • deep learning
  • machine learning
  • PM2.5 forecasting

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Software
  • Computer Science Applications
  • Computer Networks and Communications
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'Machine learning and deep learning-driven methods for predicting ambient particulate matters levels: A case study'. Together they form a unique fingerprint.

Cite this