Fitting Stochastic Epidemic Models to Multiple Data Types

Mingwei Tang

Traditional infectious disease epidemiology focuses on fitting deterministic and stochastic epidemics models to surveillance case count data. Recently, researchers began to make use of infectious disease agent genetic data to complement statistical analyses of case count data. Such genetic analyses rely on the field of phylodynamics --- a set of population genetics tools that aim at reconstructing demographic history of a population based on molecular sequences of individuals sampled from the population of interest. In this thesis, we aim at designing a general framework that can fit stochastic epidemic models to surveillance count data and to genetic data separately, or to use both sources of information at the same time. Firstly, we propose a Bayesian model that combines phylodynamic inference and stochastic epidemic models. We bypass the current computationally intensive particle Markov chain Monte Carlo (MCMC) methods and achieve computational tractability by using a linear noise approximation (LNA) --- a technique that allows us to approximate probability densities of stochastic epidemic model trajectories. LNA opens the door for using modern MCMC tools to approximate the joint posterior distribution of the disease transmission parameters and of high dimensional vectors describing unobserved changes in the stochastic epidemic model compartment sizes (e.g., numbers of infectious and susceptible individuals). Next, we propose a joint model that allows us to integrate incidence data and genetic data. Finally, we consider the dependency of genetic sequence sampling times on the latent prevalence of the infectious disease and propose a preferential sampling phylodynamics model that improves performance of phylodynamic inference. In a series of simulation studies, we show that all our proposed estimation methods can successfully recover parameters of stochastic epidemic models. Moreover, we demonstrate that combining multiple data types helps resolve identifiability issues and improves estimation precision. Throughout the dissertation, we use the incidence and genetic data from the 2014 Ebola epidemic in Sierra Leone and Liberia to illustrate our methodological developments.