Species Distribution Modeling
Introduction to Species Distribution Modeling
An important analytical technique in conservation planning is developing species distribution models. Species distribution models (SDMs), among other uses, can help predict the locations of rare and threatened plant and animal species, help model the potential spread of invasives, and help provide a comprehensive set of distribution maps that can be used in conservation prioritization. In the context of climate change, species distribution modeling can be used to generate predictions of suitable habitat assuming the the species' niche has migrated in geographical space with the change in climate.
Datasets for Species Distribution Modeling
A workflow for many types of species distribution modeling involves assembling a stack of relevant environmental variables, collecting a species observation dataset, developing a statistical model expressing the relationship between known observations and environmental variables, and then mapping this relationship to geographical space. This slidedeck on assembling SDM model input data outlines a lot of the key issues in gathering an environmental data stack. To be used in SDMs, environmental data layers are usually provided in or converted to a raster data format. As they are meant to reflect causal forces determining a species' distribution, the environmental variables should bear a direct or indirect relationship to a species' physiology or its needed resources. Types of environmental variables may include climate, topography, edaphic factors, land use, the distributions of other species, and disturbances. In general, many of these environmental variables are highly correlated with one another. If the aim of the modeling is ecological understanding in addition to just spatial prediction, it may be useful to reduce the number of highly correlated variables.
Observation data for an SDM can come from many different sources, and an analyst needs to identify the most comprehensive datasets available for a particular taxon and geographic region. Examples of commonly used observation datasets in modeling include GBIF and eBird. It is important to be aware of whether the observation dataset includes absences (for example as coming from a systematic survey based on a plot design) or is a presence-only dataset. Typically, a model developed from an observation dataset with true absences will have fewer biases introduced than one generated from a presence-only dataset. Still, it is feasible to create models with presence-only data; often this is done through simulating absences by generating random points across the geographic region.
Algorithms and Evaluating Models
A wide number of algorithms are used in species distribution modeling. Most approaches are drawn from the field of statistical learning. The basic technique is to derive a statistical model based on environmental associations to separate known occurrences from species absences (or quite often, pseudoabsences if it is a presence-only model). Examples of algorithms include generalized linear models, classification trees, regression trees, and boosted regression trees, multivariate adaptive regression splines, and maximum entropy methods (e.g Maxent). Further references on different algorithms are contained in the Species Distribution Models section of the workshop reading list.
Evaluating the results of the distribution model is an important final step in producing a model. This is accomplished through visual inspection of maps of the model's output, assessment of the ecological plausibility of the key environmental variables identified in the model, and statistical tests that make use of known occurrence points that have been set aside for comparison. Fielding and Bell 1997 and Liu et al. 2005 provide examples of such statistical tests.
Further Resources
The CA LCC workshop held in September 2013 on species distribution modeling and conservation planning gave participants a good overview of many of the techniques and issues involved in constructing and evaluating SDMs. The set of resources from this workshop provides a number of presentations, lab exercises, and reading materials on distribution modeling that are worth further study. Also, an excellent recent overview of the field is provided in the Elith and Leathwick 2009 review paper.
11/2013