Predicting the cost of illness: a comparison of alternative models applied to stroke.
Predictions of cost over well-defined time horizons are frequently required in the analysis of clinical trials and social experiments, for decision models investigating the cost-effectiveness of interventions, and for macro-level estimates of the resource impact of disease. With rare exceptions, cost predictions used in such applications continue to take the form of deterministic point estimates. However, the growing availability of large administrative and clinical data sets offers new opportunities for a more general approach to disease cost forecasting: the estimation of multivariable cost functions that yield predictions at the individual level, conditional on intervention(s), patient characteristics, and other factors. This raises the fundamental question of how to choose the "best" cost model for a given application. The central purpose of this paper is to demonstrate how to evaluate competing models on the basis of predictive validity. This concept is operationalized according to three alternative criteria: 1) root mean square error (RMSE), for evaluating predicted mean cost; 2) mean absolute error (MAE), for evaluating predicted median cost; and 3) a logarithmic scoring rule (log score), an information-theoretic index for evaluating the entire predictive distribution of cost. To illustrate these concepts, the authors conducted a split-sample analysis of data from a national sample of Medicare-covered patients hospitalized for ischemic stroke in 1991 and followed to the end of 1993. Using test and training samples of about 500,000 observations each, they investigated five models: single-equation linear models, with and without log transform of cost; two-part (mixture) models, with and without log transform, to directly address the problem of zero-cost observations; and a Cox proportional-hazards model stratified by time interval. For deriving the predictive distribution of cost, the log transformed two-part and proportional-hazards models are superior. For deriving the predicted mean or median cost, these two models and the commonly used log-transformed linear model all perform about the same. The untransformed models are dominated in every instance. The approaches to model selection illustrated here can be applied across a wide range of settings.
Lipscomb, J; Ancukiewicz, M; Parmigiani, G; Hasselblad, V; Samsa, G; Matchar, DB
Volume / Issue
Start / End Page
Pubmed Central ID
Electronic International Standard Serial Number (EISSN)
International Standard Serial Number (ISSN)
Digital Object Identifier (DOI)