Modeling multi-species datasets is a challenging statistical problem, due to the high-dimensionality and sparsity of such datasets. In this talk, I consider a recently developed method called Species Archetype Models (SAMs). As an extension of finite mixture models, SAMs cluster species based on their environmental response to form `archetypal responses'. I show that SAMs significantly improve the prediction of species, especially rare ones, due to ability to borrow strength across species in the same archetype.

I then propose two new methods for variable selection in SAMs and finite mixture models. First, I develop a new information criterion called AICmix. Unlike previously proposed criteria that are based on the complete likelihood involving missing data, AICmix is based on the observed likelihood only. This leads to some desirable properties, as shown via theory and simulations. I then consider penalized likelihood methods, and propose two new penalties for variable selection. Both penalties exploit the grouping structure inherent in the regression coefficients of mixture models. As a result, both penalties possess attractive theoretical properties e.g., consistency, oracle property, and outperform other methods of variable selection.


Francis Hui

Research Area

University of New South Wales


Fri, 02/08/2013 - 4:00pm to 5:00pm


OMB-145, Old Main Building, UNSW Kensington Campus