Backfitting for large scale crossed random effects regressions

Abstract

Large scale genomic and electronic commerce data sets often have a crossed random effects structure, arising from genotypes x environments or customers x products. Naive methods of handling such data will produce inferences that do not generalize. Regression models that properly account for crossed random effects can be very expensive to compute. The cost of both generalized least squares and Gibbs sampling can easily grow as N^(3/2) (or worse) for N observations. Papaspiliopoulos, Roberts and Zanella (2020) present a collapsed Gibbs sampler that costs O(N), but under an extremely stringent sampling model. We propose a backfitting algorithm to compute a generalized least squares estimate and prove that it costs O(N) under greatly relaxed though still strict sampling assumptions. Empirically, the backfitting algorithm costs O(N) under further relaxed assumptions. We illustrate the new algorithm on a ratings data set from Stitch Fix.

This is joint work with Swarnadip Ghosh and Trevor Hastie of Stanford University.

The talk's recording is available here. Due to an unforeseen internet outage during Art’s talk, we were only able to record about 2/3 of Art’s talk.

Speaker

Prof Art Owen

Research Area

Statistics Across Campuses

Affiliation

Stanford University

Date

Thursday, 2 February 2023, 2pm

Venue

RC-4082 and Zoom (link below with passcode: 017349)

Zoom link

Follow

Backfitting for large scale crossed random effects regressions

Abstract