It is estimated that 9 out of 10 people worldwide live in places where air quality exceeds WHO guideline limits [1]. Due to high levels of air pollution, people risk getting diseases like respiratory infections, lung cancer, and heart disease. The most health-harmful pollutants are PM2.5 particles that penetrate deep into lung passageways.
The Green Mile is a project initiated by UNStudio, Blendingbricks, Heineken, the Rijksmuseum, the Amsterdam University of Applied Sciences, and the Dutch National Bank. It aims to transform Stadhouderskade street in Amsterdam, which is currently the most polluted, busiest, and the street with the most traffic and pedestrian accidents in the city.
The main sources of pollution are road traffic and industry; for that reason, people report feeling the effects of the bad air quality when spending large amounts of time in Stadhouderskade. As such, people are expectedly not attracted to spending more time there than strictly necessary [2]. Death rates attributed to air quality pollution have decreased in the Netherlands between 1990 and 2014 (approximately 45%) but plateaued in 2014 [3], which brings a renewed need to protect the air quality, not only in Stadhouderskade but everywhere.
The goal of this challenge was to help the initiators of the project create a case and buzz for the needed change in Stadhouderskade street and, more specifically, for the current impact it has on air pollution.
GOAL 11: Sustainable Cities and Communities
The following datasets were provided to the participants:
Since no team used additional data to solve this challenge, only the provided datasets were used.
Several teams pointed out the critical relation between air pollution and weather conditions and how intrinsically related these two variables are. Wind, for example, plays a big role in determining the travel patterns of air pollutants since it can transport them. For that reason, one team mentioned that having hourly measurements of air pollution but only daily weather measurements posed a problem in analyzing the data.
All teams started with EDA by analyzing the descriptive statistics of each variable and their pairwise relations through scatter plots and correlation values. Besides that, all teams looked into variations across different time frames and possible missing data.
During data cleaning, one team established a maximum threshold for pollutant variables after identifying unusual/extreme values in the series using a moving average plot. This team fixed missing data problems using linear interpolation for missing observations that were, at most, one day apart from a known observation. The remaining missing values were discarded from the analysis. Another team used a 3-point rolling mean to fill null values.
Regarding feature engineering, several teams computed the Common Air Quality Index, which provides a unified view of the air quality at any given moment, taking into consideration three of the measured pollutants: Nitrogen Dioxide (NO2), Particulate Matter 2.5 (PM2.5), and Particulate Matter 10 (PM10). One team also calculated mutual information, permutation importance, and Principal Component Analysis.
In terms of time series modeling, several teams evaluated stationarity and autocorrelation using the Dickey-Fuller test and used Autoregressive integrated moving average (ARIMA) or SARIMA (Seasonal Autoregressive Integrated Moving Average). Others used XGBoost and LightGBM, but performances were not significantly better.
Several teams discovered that all air pollutants showed a decreasing trend from 2014 to 2022, ranging from 11% to 81%. Compounds Xylene (81%) and Toluene (71%) decreased the most, pointing to the fact that Stadhouderskade was already on the right track to decreasing air pollution.
One team used geographical data related to the location of outdoor activities in the nearby zones of Stadhouderskade street to show that there were no running routes in this street and that there was only one sports park in the vicinity of this road. This same team also showed there was strong traffic congestion since there was a high concentration of pollutants usually emitted by motor vehicles.
A team found that except for NO2 - whose values were lower in the early mornings when compared to the entire day - no other pollutants showed similar concentration patterns. However, there seems to be a pattern throughout the year: from May to August (Summer), the pollutant value decreases and the air quality index increases; in December, the pollution levels increase considerably.
As a way to productize the developed algorithm, the vast majority of teams suggested developing some type of dashboard or application that would enable city planners to view the different levels of air pollutants at any given time, along with predictions for other times in the future.
The main outcome of this product would be changing the city policy using the gathered data. One team suggested the following examples:
Another team proposed as metrics the number of days with acceptable/unacceptable levels and the percentage of air pollution decrease after deployment of their product.
There was also a team proposing that by using their analysis, city planners could create efficient traffic control policies (i.e re-routing traffic at certain times of the day) or even create additional anti-pollution policies, like limiting the usage of specific fireworks in New Years to reduce the pollution levels in critical moments. On another note, the analysis could also be used to create articles or media campaigns to generate social conscience on the pollution problem.
[1] World Health Organization. “Air pollution”. Available at: https://www.who.int/health-topics/air-pollution#tab=tab_1
[2] Amsterdam Air Quality Institute. “Air quality in Amsterdam”. Available at: https://www.iqair.com/netherlands/north-holland/amsterdam
[3] Our Word in Data. “Deaths from air pollution, 1990 to 2019”. Available at: https://ourworldindata.org/grapher/air-pollution-deaths-country?tab=chart&country=~NLD
[4] Szarata, A., Nosal, K., Duda-Wiertel, U. and Franek, L., 2017. The impact of the car restrictions implemented in the city centre on the public space quality. Transportation Research Procedia, 27, pp.752-759.