Probabilistic forecasts of continuous variables take the form of predictive densities or predictive cumulative distribution functions. We propose a diagnostic approach to the evaluation of predictive performance that is based on the paradigm of maximizing the sharpness of the predictive distributions subject to calibration. Calibration refers to the statistical consistency between the distributional forecasts and the observations, and is a joint property of the predictions and the events that materialize. Sharpness refers to the concentration of the predictive distributions, and is a property of the forecasts only. A simple theoretical framework phrased in terms of a game between nature and forecaster allows us to distinguish probabilistic calibration, exceedance calibration and marginal calibration. We propose and study tools for checking calibration and sharpness, among them the probability integral transform (PIT) histogram, marginal calibration plots, the sharpness diagram and proper scoring rules. The diagnostic approach is illustrated by an assessment and ranking of probabilistic forecasts of wind speed at the Stateline wind energy center in the U.S. Pacific Northwest. In combination with cross-validation or in the time series context, our methods provide very general, nonparametric tools for model criticism and model selection as well. This is joint work with Fadoua Balabdaoui and Adrian Raftery. A Technical Report is available at www.stat.washington.edu/www/research/reports/2005/tr483.pdf.
Tilmann Gneiting ARTICLE