Bayesian methods are attractive for analyzing large-scale data due to in part to their coherent uncertainty quantification, ability to model complex phenomena, and ease of incorporating expert information. Many standard Bayesian inference algorithms are often computationally expensive, however, so their direct application to large datasets can be difficult or infeasible. Other standard algorithms sacrifice accuracy in the pursuit of scalability. We take a new approach. Namely, we leverage the insight that data often exhibit approximate redundancies to instead obtain a weighted subset of the data (called a "coreset") that is much smaller than the original dataset. We can then use this small coreset as input to existing Bayesian inference algorithms without modification. We provide theoretical guarantees on the size and approximation quality of the coreset. In particular, we show that our method provides geometric decay in posterior approximation error as a function of coreset size. We validate on both synthetic and real datasets, demonstrating that our method reduces posterior approximation error by orders of magnitude relative to uniform random subsampling.


Tamara Broderick

Research Area

Massachusetts Institute of Technology (MIT)


Fri, 18/01/2019 - 3:00pm


Chemical Sciences M11 (building F10), UNSW