Cory Merow, Brian Enquist, Brad Boyle, Naia Morueta-Holme, Jens-Christian Svenning
This document provides a brief overview of the methods used to develop range models for the BIEN3.0 database so that users can judge their adequacy for their own applications.
Data used for range modeling - As part of the BIEN workflow, all occurrence records were filtered and standardized with the following protocols:
Environmental covariates - Range models were constructed for each species using environmental layers and spatial constraints. These layers were obtained from WorldClim at 5 arc-minute resolution (Hijmans et al. 2005) projected to a 10 km resolution. Predictors included mean annual temperature, mean diurnal temperature range, annual precipitation, precipitation seasonality, precipitation in warmest quarter/ (precipitation in warmest quarter + precipitation in coldest quarter), and five spatial eigenvectors. The spatial eigenvectors corresponded to large scale regional differences and primarily served to limit predictions far from known presence locations in geographic space (Diniz-Filho & Bini, 2005). Only one occurrence record per cell (in cases of multiple records) was used for model building.
Range modeling decision tree - Different range estimation methods were used depending upon the sample size of (unique) presence locations.
Maxent model settings were chosen to balance overfitting (under estimating range sizes) with underfitting (excessively smooth models that over predict range size), generally following recommendations in Merow et al. (2013, 2014). Only linear, quadratic, and product features were used and regularization was set at the default value Maxent’s continuous predictions were converted to binary presence/absence predictions by choosing a threshold based on the 75th percentile of the cumulative output (based on analyses validated with 700 species for which expert maps were available; Morueta-Holme et al., in prep).
Automating range model building - All geographic ranges were run at the Texas Advanced Computing Center (TACC). Approximately 90,000 ranges were run via TACC.
Caveats - Modeling ranges for ~90,000 species is not without potential flaws and some caveats should be recognized. Notably, sample size remains small for the vast majority of specieswith over 50% of the species represented by 5 or fewer occurrences. Consequently, many ranges are estimated using some somewhat coarse methods (i.e. not from species distribution models). In addition, it is impossible to automatically detect all problematic, outlying, or nonnatural occurrence records and those that remain may influence range predictions.
Given our attempts to avoid overfitting, the species distribution models are more likely to underfit spatial distribution patterns and consequently may predict ranges larger than those realized for some species. That is, the models may predict suitable habitat in locations that are inaccessible to the species (but in similar environmental conditions to where they occur) or predict suitable habitat slightly beyond realized range edges due to fitting relatively smoothed response curves. To offset this, cells where presence was predicted by Maxent farther than 1000km from any presence record were removed from the range. The modelling did not account for variation in sampling effort or detection probability.
As with any range map, our predictions represent hypotheses about spatial occurrence patterns. In spite of these caveats, predictions for the vast majority of species are reliable and are well-suited for macroecological analyses.
Updates - Our range modeling efforts are a dynamic enterprise and we are constantly exploring ways to improve predictions, leading to periodic updates in our database. Planned updates include addition of new occurrence data, addition of new information on native versus introduced range, choosing optimal model settings tuned specifically for each species, accounting for sampling variation, and improving occurrence data cleaning methods. We will employ version control to maintain accessibility of all past versions as updates are released.
References