Compositional data are challenging to analyse due to the non-negativity and sum-to-one constraints on the sample space. It is often the case with microbiome compositional data that many of the components are highly right-skewed, with large numbers of zeros. A major limitation of currently available estimators for compositional models is that they either cannot handle many zeros in the data or are not computationally feasible in moderate to high dimensions. We derive a new set of novel score matching estimators applicable to distributions on a Riemannian manifold with boundary, of which the standard simplex is a special case. The score matching method is applied to estimate the parameters in a new flexible model for compositional data and we show that the estimators are scalable and available in closed form. We apply the new model and estimators to real microbiome compositional data and show that the model provides a good fit to the data.

School Seminar Series: 


A/Prof Janice Scealy




Fri, 25/03/2022 - 4:00pm