Bayesian Structured Sparsity for Genetic Association Mapping

Barbara Engelhardt

In genomic sciences, the amount of data has grown faster than statistical methodologies necessary to analyze those data. Furthermore, the complex underlying structure of these data means that simple, unstructured statistical models do not perform well. We consider the problem of identifying multiple, functionally independent, co-localized genetic regulators of gene transcription. Sparse regression techniques have been critical to multi-SNP association mapping because of their computational tractability in large data settings. These traditional models are hindered by the substantial correlation between genetic variants. I describe a model for Bayesian structured sparse regression that incorporates arbitrary structure of the predictors directly into a Gaussian field to yield structure-aware sparse regression coefficients. On simulated data, we find that our approach substantially outperforms current methods. We applied this model to a study of expression QTLs and found that our approach yields highly interpretable, robust solutions for allelic heterogeneity, particularly when the interactions between genetic variants are well approximated by an additive model. This is joint work with Ryan Adams (Harvard).

Start Time


Building Map