Scholars@Duke publication: A calibration hierarchy for risk models was defined: from utopia to empirical data.

A calibration hierarchy for risk models was defined: from utopia to empirical data.

Publication , Journal Article

Van Calster, B; Nieboer, D; Vergouwe, Y; De Cock, B; Pencina, MJ; Steyerberg, EW

Published in: J Clin Epidemiol

June 2016

OBJECTIVE: Calibrated risk models are vital for valid decision support. We define four levels of calibration and describe implications for model development and external validation of predictions. STUDY DESIGN AND SETTING: We present results based on simulated data sets. RESULTS: A common definition of calibration is "having an event rate of R% among patients with a predicted risk of R%," which we refer to as "moderate calibration." Weaker forms of calibration only require the average predicted risk (mean calibration) or the average prediction effects (weak calibration) to be correct. "Strong calibration" requires that the event rate equals the predicted risk for every covariate pattern. This implies that the model is fully correct for the validation setting. We argue that this is unrealistic: the model type may be incorrect, the linear predictor is only asymptotically unbiased, and all nonlinear and interaction effects should be correctly modeled. In addition, we prove that moderate calibration guarantees nonharmful decision making. Finally, results indicate that a flexible assessment of calibration in small validation data sets is problematic. CONCLUSION: Strong calibration is desirable for individualized decision support but unrealistic and counter productive by stimulating the development of overly complex models. Model development and external validation should focus on moderate calibration.

Duke Scholars

Author Michael J Pencina Biostatistics & Bioinformatics, Division of Biostatistics

Altmetric Attention Stats

Dimensions Citation Stats

Published In

J Clin Epidemiol

DOI

10.1016/j.jclinepi.2015.12.005

EISSN

1878-5921

Publication Date

June 2016

Volume

Start / End Page

167 / 176

Location

United States

Related Subject Headings

Risk Assessment
Risk
Reproducibility of Results
Models, Statistical
Humans
Epidemiology
Decision Support Techniques
Computer Simulation
Calibration
Bias

Citation

APA

Chicago

ICMJE

MLA

NLM

Van Calster, B., Nieboer, D., Vergouwe, Y., De Cock, B., Pencina, M. J., & Steyerberg, E. W. (2016). A calibration hierarchy for risk models was defined: from utopia to empirical data. J Clin Epidemiol, 74, 167–176. https://doi.org/10.1016/j.jclinepi.2015.12.005

Van Calster, Ben, Daan Nieboer, Yvonne Vergouwe, Bavo De Cock, Michael J. Pencina, and Ewout W. Steyerberg. “A calibration hierarchy for risk models was defined: from utopia to empirical data.” J Clin Epidemiol 74 (June 2016): 167–76. https://doi.org/10.1016/j.jclinepi.2015.12.005.

Van Calster B, Nieboer D, Vergouwe Y, De Cock B, Pencina MJ, Steyerberg EW. A calibration hierarchy for risk models was defined: from utopia to empirical data. J Clin Epidemiol. 2016 Jun;74:167–76.

Van Calster, Ben, et al. “A calibration hierarchy for risk models was defined: from utopia to empirical data.” J Clin Epidemiol, vol. 74, June 2016, pp. 167–76. Pubmed, doi:10.1016/j.jclinepi.2015.12.005.

Van Calster B, Nieboer D, Vergouwe Y, De Cock B, Pencina MJ, Steyerberg EW. A calibration hierarchy for risk models was defined: from utopia to empirical data. J Clin Epidemiol. 2016 Jun;74:167–176.

Published In

J Clin Epidemiol

DOI

10.1016/j.jclinepi.2015.12.005

EISSN

1878-5921

Publication Date

June 2016

Volume

Start / End Page

167 / 176

Location

United States

Related Subject Headings

Risk Assessment
Risk
Reproducibility of Results
Models, Statistical
Humans
Epidemiology
Decision Support Techniques
Computer Simulation
Calibration
Bias