Genetic association studies typically we have up to 10^7 genetic markers and between 10^3 and 10^5 study subjects, and so they often face a “too many predictors” problem requiring further assumptions. Sparsity assumptions were popular in the past, but they were never justified and we now appreciate that nature is far from sparse. The reality of large numbers of weak causal effects means that methods to deal with overfitting must be chosen carefully. I will show that the assumptions underlying current widely-used software are problematic, and describe our alternative model. With the size of current datasets and concerns about privacy of genetic information, many studies only make available summary statistics of association and not individual genotype data. A recently-developed method for heritability analyses using only summary statistics is subject to a similar criticism, and we propose a similar remedy. For some analyses our new results are strikingly different from those previously published, with big implications for our understanding of genetic architecture and priorities for future studies.
This is joint work with Doug Speed, of Aarhus Institute of Advanced Studies, Denmark.