Species distribution modelling uses records of where a species is known to occur to model how it relates to a suite of environmental variables.
This may be done to better understand the ecology of the target species, to predict its response to environmental change, or to assist conservation efforts. The modelling process involves many challenging steps, and we are currently exploring the following problems:
- MARGE - multivariate adaptive regression splines for correlated non-normal data (Jakub Stoklosa).
- Spatial confounding in point process models (Wesley Brooks).
- Fast methods to fit point process models to clustered data (Wesley Brooks, Elliot Dovers).
- Point process models for clustered data along stream networks (Wesley Brooks, with Jay verHoef, Erin Peterson).
- Point process models for multi-species point event data (Elliot Dovers, Wesley Brooks).
- Accounting for uncertainty in predictor variables (Jakub Stoklosa and Firouzeh Noghrehchi, with Chris Daly and Scott Foster).
- The use of shrinkage approaches for model selection and to improve predictive performance (Ian Renner, Francis Hui with Scott Foster).
- Using field measurements of climate variables to improve predictive performance (Eve Slavich).
Previously we unified different methods for analysing presence-only data: pseudo-absences, point-process models and MAXENT (Ian Renner and Leah Shepherd) and investigated model-baesd control of observer bias in presence only anlaysis (Ian Renner with Dan Ramp).
Software
-
Software for species distribution modelling using presence-only data, with a range of potential choices of penalised estimation (LASSO, elastic net, etc) to avoid over-fitting and implicitly perform model selection.
-
A Matlab toolbox was developed for (Forster & Warton 2007). (You can access the Matlab toolbox by copying the link and opening it in a browser)
Project ideas
Efficient algorithms for fitting latent variable models
Models including latent variables are invaluable in ecology for a number of applications - not just for models with surveys with multiple sampling units, but recently we have been interested in their application for visualising and modelling multivariate data, and for accounting for errors in predictor variables. How can we fit such models more efficiently? We will explore a number of options centered around extensions of the Monte Carlo EM algorithm and efficient Laplace and variation approximation algorithms.
Advances for errors-in variables modelling
Often predictor variables (in ecology and elsewhere) are measured with error, and failing to take this into account biases estimates of the fitted model, and often, subsequent predictions. We have been developing easy-to-use algorithms for modelling such data, having initially focussed on generalised linear models for data with measurement error that is independent across observations. Important extensions include: how to extend to spatially correlated measurement error? How to generalise to handle general predictive models (beyond GLM)?