We consider the problem of online prediction when it is uncertain what the best prediction model to use is. We develop a method called Dynamic Model Averaging (DMA) in which a state space model for the parameters of each model is combined with a Markov chain model for the correct model. This allows the “correct” model to vary over time. The state space and Markov chain models are both specified in terms of forgetting, leading to a highly parsimonious representation. The method is applied to the problem of predicting the output strip thickness for a cold rolling mill, where the output is measured with a time delay. We found that when only a small number of physically motivated models were considered and one was clearly best, the method quickly converged to the best model, and the cost of model uncertainty was small; indeed DMA performed slightly better than the best physical model. When model uncertainty and the number of models considered were large, our method ensured that the penalty for model uncertainty was small. At the beginning of the process, when control is most difficult, we found that DMA over a large model space led to better predictions than the single best performing physically motivated model.