This challenge aimed to understand how socio-demographic factors, tourist attractions, and people’s mobility level explain the demand for green spaces.
According to a UN report, “urban green areas offer great opportunities for positive change and the sustainable development of our cities” due to creating spaces for outdoor activities. Also, according to the World Health Organisation (WHO), “urban green spaces, such as parks, playgrounds, and residential greenery, can promote mental and physical health, and reduce morbidity and mortality in urban residents by providing psychological relaxation and stress alleviation, stimulating social cohesion, supporting physical activity, and reducing exposure to air pollutants, noise and excessive heat.”.
The goal of the challenge is to understand how socio-demographic factors, tourist attractions, and people’s mobility level around the green spaces explain the demand for these spaces.
Goal
Create a model that predicts the average daily demand for green spaces and the main contributing factors.
United Nations SDG
GOAL 11: Sustainable Cities and Communities
Datasets
The following datasets were provided to the participants:
Visits to green spaces, extracted from mobility data. The data was a snapshot in time and not time series. Provided by PSE.
The number of museums, parking lots, buildings, families, and people residing in the surrounding area of the green space. Provided by PSE.
Percentage of residents that are younger than 19 years old or part of the senior population. Provided by PSE.
Data
Three major criticisms to the dataset were identified: lack of clarity on how the area of influence is calculated, the size of the dataset and the imbalance between the number of green spaces in the city of Porto and Lisbon. The participants noted that it would be better to work with raw data and, ideally, use an automatic visitor counting in the green areas to yield better results.
One team supplemented the dataset with OpenStreetMap data (from cafes and restaurants, for example), parish socio-demographic information, and pollution levels from OpenWeather. Other teams have also suggested adding extra data, such as the location and data regarding safety in the area of the green space, as well as the time that is spent in the green area.
Methods and Techniques
Since the dataset was quite small, the teams started by examining the correlations between the feature sets. Afterward, the teams tried to train different regression models with the target set as the demand.
One team tried linear regression, LASSO regression, ridge regression, and Random Forest algorithms, with very large MSE and MAE (Mean Average Error). Another team tried even a bigger range of algorithms: Linear Regression, Decision Tree and Random Forest Regressor, K Nearest Neighbor, Ridge Regression, Bayesian Regression, Principal Component Regression, Polynomial Regression, and Partial Least Squares Regression.
The teams reported very weak results mainly due to the size of the dataset. Most of the teams used the feature importance to analyze the possible factors that were most influential.
Main Insights from Data
The teams found it hard to compare the data between Lisbon and Porto, as Porto has five times fewer parks than Lisbon. Nevertheless, it was noticed that what holds true for one city, might not be for another - for example, in Lisbon, the elderly population has a positive correlation with the demand, while in Porto, it is negative.
Regarding the main driving factors, it was discovered that accessibility to the park (see Figure 1), playgrounds, and more pedestrian streets are paramount. The predictive power of the models was meager, as the dataset was very small and without any temporal information.