Ensembles of Trees and CLT's: Inference and Machine Learning

Abstract:

This talk develops methods of statistical inference based around ensembles of decision trees: bagging, random forests, and boosting. Recent results have shown that when the bootstrap procedure in bagging methods is replaced by sub-sampling, predictions from these methods can be analyzed using the theory of U-statistics which have a limiting normal distribution. Moreover, the limiting variance that can be estimated within the sub-sampling structure.

Using this result, we can compare the predictions made by a model learned with a feature of interest, to those made by a model learned without it and ask whether the differences between these could have arisen by chance. By evaluating the model at a structured set of points we can also ask whether it differs significantly from an additive model. We demonstrate these results in an application to citizen-science data collected by Cornell's Laboratory of Ornithology.

Given time, we will examine recent developments that extend distributional results to boosting-type estimators. Boosting allows trees to be incorporated into more structured regression such as additive or varying coefficient models and often outperforms bagging by reducing bias.

This seminar is organised by ANU.

JOIN THE MEETING VIA: https://anu.zoom.us/j/425258947

Speaker

Professor Giles Hooker

Research Area

Statistics Seminar

Affiliation

ANU

Date

Thu, 14/05/2020 - 10:00am

Venue

https://anu.zoom.us/j/425258947

Follow

Ensembles of Trees and CLT's: Inference and Machine Learning

Abstract: