This paper aims at presenting a novel way of predicting and analyzing air traffic delays using publicly available data from social media with a focus on Twitter data. Three different machine learning regressors have been trained on this 2017 passenger-centric dataset and tested for the prediction up to five hours ahead of air traffic delays and cancellations for the first two months of 2018. Comparing and analyzing different accuracy measures of their prediction performances show that this dataset contains useful information about the current state and short-term future state of the air traffic system. The resulting methods yield higher prediction accuracy than traditional state-of-the-art and off-the-shelf time-series forecasting techniques performed on flight-centric data. Moreover a post-training feature importance analysis conducted on the Random Forest regressor allowed a simplification and a refining of the model, leading to a faster training time and more accurate predictions. This paper is a first step in predicting and analyzing air traffic delays leveraging a real-time publicly available passenger-centered data source. The results of this study suggest a method to use passenger-centric data-sources both as an estimator of the current state of air traffic delays as well as an estimator of the short-term state of air traffic delays in the United States in real-time.
|Original language||English (US)|
|Title of host publication||13th USA/Europe Air Traffic Management Research and Development Seminar 2019|
|State||Published - Jan 1 2019|