Abstract: 

Testing one SNP at a time does not fully realise the potential of genome-wide association studies to identify multiple causal variants, which is a plausible scenario for many complex diseases. We show that simultaneous analysis of the entire set of SNPs from a genome-wide study, to identify the subset that best predicts disease outcome, is now feasible thanks to developments in stochastic search methods. We employ a Bayesian-inspired penalised maximum likelihood approach in which every SNP can be considered for additive, dominant and recessive contributions to disease risk. Posterior mode estimates are obtained for regression coefficients that are each assigned a prior with a sharp mode at zero. A non-zero coefficient estimate is interpreted as corresponding to a significant SNP. We investigate two prior distributions and show that the normal-exponential-gamma prior leads to improved SNP selection in comparison with single-SNP tests. We also derive an explicit approximation for type-I error that avoids the need to employ permutation procedures. As well as genome-wide analyses, our method is well-suited to fine-mapping with very dense SNP-sets obtained from resequencing and/or imputation. It can accommodate quantitative as well as case-control phenotypes, covariate adjustment, and can be extended to search for interactions. We demonstrate the method using simulated case-control data sets of up to 500K SNPs, a real genome-wide data set of 300K SNPs, and a sequence-based dataset, each of which can be analysed in a few hours on a desktop workstation. The talk is based on PLoS Genetics 4(7): e1000130. doi:10.1371/journal.pgen.1000130 and recent work. This talk is a collaboration with Clive Hoggart, John Whittaker, and Maria De Iorio.

About the speaker: David Balding is Professor of Statistical Genetics at the Department of Epidemiology and Public Health, Imperial College, London. His current interests are in genetics and genomics and in public and international health.

Speaker

Professor David Balding

Research Area

Statistics Seminar

Affiliation

Imperial College London

Date

Fri, 22/05/2009 - 4:00pm

Venue

RC-4082