Bayesian Robust Variable and Transformation Selection: A Unified Approach

Tech Report Number
508

 

Abstract

We consider the problem of simultaneous variable and transformation selection for linear regression. We propose a fully Bayesian solution to the problem, which allows us to average over all possible models including transformations of the response and predictors. We use the Box-Cox family of transformations to transform the response and each predictor. To deal with the change of scale induced by the transformations, we propose to focus on new quantities rather than the estimated regression coefficients. These quantities, that we call generalized regression coefficients, have a similar interpretation to the usual regression coefficients on the original scale of the data, but do not depend on the transformations. This allows us to make probabilistic statements about the size of the effect associated with each variable, on the original scale of the data. Finally, in addition to variable and transformation selection, there is also uncertainty involved in the identification of outliers in regression. In this paper, we also propose a more robust model to account for such outliers based on a t-distribution with unknown degrees of freedom. Parameter estimation is carried out using an efficient Markov chain Monte Carlo algorithm, which allows us to move around the space of all possible models. Using three real data sets and a simulated one, we show that there is considerable uncertainty between model selection, transformation and outlier identification and that the three should be done simultaneously.

 

tr508.pdf268.82 KB