According to a UN report, “urban green areas offer great opportunities for positive change and the sustainable development of our cities” due to creating spaces for outdoor activities. Also, according to the World Health Organisation (WHO), “urban green spaces, such as parks, playgrounds, and residential greenery, can promote mental and physical health, and reduce morbidity and mortality in urban residents by providing psychological relaxation and stress alleviation, stimulating social cohesion, supporting physical activity, and reducing exposure to air pollutants, noise and excessive heat.”.
The goal of the challenge is to understand how socio-demographic factors, tourist attractions, and people’s mobility level around the green spaces explain the demand for these spaces.
Create a model that predicts the average daily demand for green spaces and the main contributing factors.
GOAL 11: Sustainable Cities and Communities
The following datasets were provided to the participants:
Three major criticisms to the dataset were identified: lack of clarity on how the area of influence is calculated, the size of the dataset and the imbalance between the number of green spaces in the city of Porto and Lisbon. The participants noted that it would be better to work with raw data and, ideally, use an automatic visitor counting in the green areas to yield better results.
One team supplemented the dataset with OpenStreetMap data (from cafes and restaurants, for example), parish socio-demographic information, and pollution levels from OpenWeather. Other teams have also suggested adding extra data, such as the location and data regarding safety in the area of the green space, as well as the time that is spent in the green area.
Since the dataset was quite small, the teams started by examining the correlations between the feature sets. Afterward, the teams tried to train different regression models with the target set as the demand.
One team tried linear regression, LASSO regression, ridge regression, and Random Forest algorithms, with very large MSE and MAE (Mean Average Error). Another team tried even a bigger range of algorithms: Linear Regression, Decision Tree and Random Forest Regressor, K Nearest Neighbor, Ridge Regression, Bayesian Regression, Principal Component Regression, Polynomial Regression, and Partial Least Squares Regression.
The teams reported very weak results mainly due to the size of the dataset. Most of the teams used the feature importance to analyze the possible factors that were most influential.
The teams found it hard to compare the data between Lisbon and Porto, as Porto has five times fewer parks than Lisbon. Nevertheless, it was noticed that what holds true for one city, might not be for another - for example, in Lisbon, the elderly population has a positive correlation with the demand, while in Porto, it is negative.
Regarding the main driving factors, it was discovered that accessibility to the park (see Figure 1), playgrounds, and more pedestrian streets are paramount. The predictive power of the models was meager, as the dataset was very small and without any temporal information.