In late June, Nicki Kämpf and Jonas Krembsler presented first results from the research project “ReComMeND“ of fare revenue forecasting in public transportation in Berlin as part of our doctoral colloquium. The research project is funded by the IFAF Berlin and supported by Berliner Verkehrsbetriebe AöR, Internationaler Controller Verein e.V., Lufthansa Systems GmbH & Co. KG and Lufthansa Industry Solutions GmbH.
For their research, they use data based on monthly fare revenues for different product segments. The results will be used in a research project in public transport with the goal of automating revenue controlling and implementing data-driven decision-making in the existing controlling processes.
The focus of their study is to obtain suitable and reliable predictions: on the one hand with autoregressive methods such as ARIMA, SARIMA as well as Holt-Winters Exponential Smoothing and on the other hand with methods that include exogenous variables such as SARIMAX, MLR, LASSO, Ridge, Random Forests, Gradient Boosting, and Neural Networks. The data concerning exogenous variables are freely available and cover a wide range from tourism data to labor market development and weather data.
In their work, the researchers discuss the different methods and compare the prediction results by means of common accuracy measures. The goal is to evaluate a wide range of different methods in order to decide in which situations they out- or underperform other methods.
Besides simple prediction accuracy, another part of the study is the feature selection and interpretation of their impact. They address automatic feature selection using traditional approaches such as AIC optimization, a rolling window cross-validation approach optimizing the cv-error, and algorithmic approaches such as LASSO or Bayesian optimization. The researchers discuss the interpretability of the results and the advantages and disadvantages of different approaches.