Graphical Models for Single Cell Gene Expression
Advisor: Raphael Gottardo and Mathias Drton
Until recently, gene expression experiments have relied on aggregations of many thousands of cells as their input and unit of experimentation. However, it is now possible to interrogate gene expression in single cells and these experiments promise to shed light on cell-to-cell variation in expression. A unique feature of single cell expression is zero-inflation of otherwise continuous measurements, in which measurements are either strongly positive, or undetectable. We employ a two-part linear modelâ€“a hurdle modelâ€“to accommodate this feature. The hurdle model leads to a class of univariate tests for differential expression, and admits a generalization to allow multivariate, graphical Markov modeling of statistical independences for such zero-inflated, continuous data. The multivariate model can be shown to be a finite mixture of singular Gaussian distributions. Although the joint likelihood involves an intractable normalizing constant, the conditional likelihood is tractable, and allows neighborhood-based inference of conditional independences. Four parameters must simultaneously vanish for conditional independence to exist between a pair of coordinates, given all others. In order to infer independence patterns between genes, we thus apply a penalized maximum likelihood approach using the so-called group lasso penalty. It is hoped that this procedure will yield efficient inference in high-dimensional problems.