Abstract
The recent availability of species occurrence data from numerous sources, standardized and connected within a single portal, has the potential to answer fundamental ecological questions. These aggregated big biodiversity databases are prone to numerous data errors and biases. The data-user is responsible for identifying these errors and assessing if the data are suitable for a given purpose. Complex technical skills are increasingly required for handling and cleaning biodiversity data, while biodiversity scientists possessing these skills are rare. Here, we estimate the effect of user-level data cleaning on species distribution model (SDM) performance. We implement several simple and easy-to-execute data cleaning procedures, and evaluate the change in SDM performance. Additionally, we examine if a certain group of species is more sensitive to the use of erroneous or unsuitable data. The cleaning procedures used in this research improved SDM performance significantly, across all scales and for all performance measures. The largest improvement in distribution models following data cleaning was for small mammals (1 g-100 g). Data cleaning at the user level is crucial when using aggregated occurrence data, and facilitating its implementation is a key factor in order to advance data-intensive biodiversity studies. Adopting a more comprehensive approach for incorporating data cleaning as part of data analysis, will not only improve the quality of biodiversity data, but will also impose a more appropriate usage of such data.
Original language | English |
---|---|
Pages (from-to) | 139-145 |
Number of pages | 7 |
Journal | Ecological Informatics |
Volume | 34 |
DOIs | |
State | Published - 1 Jul 2016 |
Keywords
- Australian mammals
- Big-data
- Biodiversity informatics
- Data-cleaning
- MaxEnt
- SDM performance
All Science Journal Classification (ASJC) codes
- Ecology, Evolution, Behavior and Systematics
- Ecology
- Modelling and Simulation
- Ecological Modelling
- Computer Science Applications
- Computational Theory and Mathematics
- Applied Mathematics