Adaptive Bayesian Nonparametric Smoothing with Markov Random Fields and Shrinkage Priors
The need to estimate unknown functions or surfaces arises in many disciplines in science and there are many statistical methods available to do this. Our interest lies in using Bayesian nonparametric approaches to estimate unknown functions. One such approach to nonparametric estimation is based on the Gaussian Markov random field priors. This class of computationally efficient and flexible methods is widely used in applications. There is frequently the need to estimate functions with change points, discontinuities, or abrupt changes, or functions with varying levels of smoothness. Gaussian Markov random fields have limited ability to accurately capture such features. We develop a locally adaptive version of Markov random fields that uses shrinkage priors on the order-$k$ increments of the discretized function and has the flexibility to accommodate a large class of functional behaviors. We show that the horseshoe prior results in superior performance in comparison to other shrinkage priors. The horseshoe prior induces sparsity in the increments, which provides good smoothing properties, and at the same time the heavy tails of the prior allow for jumps and discontinuities in the field. We first apply the method to some standard settings where we use simulated data to compare to other methods and then apply the models to two benchmark data examples frequently used to test nonparametric methods. We use Hamiltonian Monte Carlo to approximate the posterior distribution of model parameters because this method provides superior performance in the presence of the high dimensionality and strong parameter correlations exhibited by our models. We then extend the method to the estimation of effective population sizes using the coalescent process and genetic sequence data. For that application, we develop a custom Markov chain Monte Carlo sampler based on a combination of elliptical slice sampling and Gibbs sampling. We test the method using simulated data and then use it to reconstruct past changes in genetic diversity of human hepatitis C virus in Egypt and to estimate population size changes of ancient and modern steppe bison. Finally, we extend the method for use in the spatial setting, where we apply the method to disease mapping and to estimating the intensity of an inhomogeneous spatial point process. Overall, we find that this method is flexible enough to accommodate a variety of data generating models and offers the adaptive properties and computational tractability to make it a useful addition to the Bayesian nonparametric toolbox.