The Avencas Marine Protected Area (AMPA) is a Biophysical Interest Zone in Cascais, Portugal. The AMPA has been under close observation since 2010, with regular biodiversity sampling taking place and the source of a case study by Ferreira et al. (2017).
To help protect this unique marine ecosystem, measures were taken to reduce human interference, but the system did not recover as well as expected. This is why the municipality of Cascais is looking for help getting a long-term analysis of changes in the abundance of species in the AMPA.
The two main focus areas are:
Identify variables that potentially impact the marine ecosystem of the Avencas Marine Protected Area and predict further developments with a special focus on endangered and invasive species.
GOAL 14: Life below water
The following datasets were provided to the participants:
Cascais’ challenge centered on identifying new variables that might be impacting the marine ecosystem of the AMPA. Due to the exploratory nature of the challenge, teams used many additional data sources. The following list is a selection.
One team suggested setting up a measurement station close to the AMPA to gather data on chemical and physical properties directly in the area of interest.
Approaches of the teams for this channel ranged from using SARIMA, LASSO, gradient boosting regressor, XGBoost and random forest regressor models to determine features importance. For predictive modeling, several teams relied on time-series models like SARIMA and VAR.
For data preprocessing one team leveraged PCA to reduce the dimensions of the environment data.
One team developed SARIMA models to predict the number of invasive species, the number of endangered species and the Shannon-Wiener Index derived from the mobile abundance data and a converted version of the sessile data using the methodology of Deepananda and Macusi (2013) with the formula:
H=−∑[(pi)×log(pi)]
H=Shannon diversity indexpi=proportion of individuals of ith species in the population
Fitting a SARIMA model to the derived Shannon-Index allowed for residual analysis to determine feature importance. This team chose to use a SARIMAX model incorporating the features identified for use in their product with an RMSE of 0.20 and an MAE of 0.17.
Another team determined diversity by using the Hill-Simpson metric inspired by Roswell et al. (2021). After testing several models, the team selected a LASSO model based on its strong regularization to determine feature importance. They noted that none of the many weather-based variables they analyzed was a strong predictor of species diversity but there was a trend that lower water temperatures, humidity, precipitation, water vapor pressure deficit, and a higher cloud cover was associated with more biodiversity.
One team noticed how well the biodiversity in AMPA correlated with Ocean health indexes for all of Portugal postulating that this might indicate that the reason for the slower recovery of species in the AMPA could be caused by global rather than local factors. Invasive species were correlated with higher chlorine levels and endangered species by temperature-based features. In general, the biodiversity seemed stable over the years, which correlated with the information from the domain experts, that they did not see the recovery expected by their interventions.
Several teams noted that the occurrence of the vast majority of species was rare with only a few species being common in many samples, posing a difficulty for modeling.
After testing conventional models, another team decided to add a more complex approach using an LSTM followed by fully connected final layers both with and without the 5 features: tide, weather condition, water temperature, season and moon phase trying to predict the abundance of Cladophora sp. Smooth, a green algae (Figure 4). They noted that adding these 5 features while increasing the train and validation performance did not meaningfully increase the performance of the model on the test set.
One team incorporated both their feature selection and forecasting work relative to ocean pH in a dashboard built with Streamlit (Figure 4). Additionally, this team created an open-source Python package beautiful-sea aimed at scaling their findings to other marine ecosystems.
Another team created a dashboard showing many of their findings such as feature importances derived from a catboost decision tree algorithm and abundance data for individual species as well as their conservation status.
A third team, after noticing that invasive species seem to thrive more when the ocean is getting warmer, looked into existing technology which could lower the sea temperature and proposed using shade balls. While shade balls are controversial in their initial purpose to save water due to requiring a lot of water to be manufactured, this team proposed this alternative use case for them in the interest of biodiversity. In any case, the focus on sea level temperature is extremely relevant since as of writing this metric has been off the charts with yet unknown impact on biodiversity and marine as well as all ecosystems.
Using the insights from the teams, researchers and other interested parties are able to examine features for which modeling showed a high correlation with biodiversity measurements for their biological plausibility. In a second step this work can be used as a basis for specific interventions aimed at protecting the AMPA and to raise public awareness for marine biodiversity, especially in the local population and tourists visiting the area.
One core finding was the correlation between ocean acidification and a lower Shannon diversity index, indicating less biodiversity. This is especially prudent because climate change is causing the ocean to acidify, which is a potential direct link between the climate crisis and biodiversity in a local and well-studied marine area. Highlighting this connection to tourists and locals could increase climate change awareness and action by making the local impact tangible.