Scholars@Duke publication: Misuse of DeLong test to compare AUCs for nested models.

Misuse of DeLong test to compare AUCs for nested models.

Publication , Journal Article

Demler, OV; Pencina, MJ; D'Agostino, RB

Published in: Stat Med

October 15, 2012

The area under the receiver operating characteristics curve (AUC of ROC) is a widely used measure of discrimination in risk prediction models. Routinely, the Mann-Whitney statistics is used as an estimator of AUC, while the change in AUC is tested by the DeLong test. However, very often, in settings where the model is developed and tested on the same dataset, the added predictor is statistically significantly associated with the outcome but fails to produce a significant improvement in the AUC. No conclusive resolution exists to explain this finding. In this paper, we will show that the reason lies in the inappropriate application of the DeLong test in the setting of nested models. Using numerical simulations and a theoretical argument based on generalized U-statistics, we show that if the added predictor is not statistically significantly associated with the outcome, the null distribution is non-normal, contrary to the assumption of DeLong test. Our simulations of different scenarios show that the loss of power because of such a misuse of the DeLong test leads to a conservative test for small and moderate effect sizes. This problem does not exist in cases of predictors that are associated with the outcome and for non-nested models. We suggest that for nested models, only the test of association be performed for the new predictors, and if the result is significant, change in AUC be estimated with an appropriate confidence interval, which can be based on the DeLong approach.

Duke Scholars

Author Michael J Pencina Biostatistics & Bioinformatics, Division of Biostatistics

Altmetric Attention Stats

Dimensions Citation Stats

Published In

Stat Med

DOI

10.1002/sim.5328

EISSN

1097-0258

Publication Date

October 15, 2012

Volume

Issue

Start / End Page

2577 / 2587

Location

England

Related Subject Headings

Statistics & Probability
Risk Assessment
ROC Curve
Models, Statistical
Humans
Data Interpretation, Statistical
Coronary Disease
Computer Simulation
Area Under Curve
4905 Statistics

Citation

APA

Chicago

ICMJE

MLA

NLM

Demler, O. V., Pencina, M. J., & D’Agostino, R. B. (2012). Misuse of DeLong test to compare AUCs for nested models. Stat Med, 31(23), 2577–2587. https://doi.org/10.1002/sim.5328

Demler, Olga V., Michael J. Pencina, and Ralph B. D’Agostino. “Misuse of DeLong test to compare AUCs for nested models.” Stat Med 31, no. 23 (October 15, 2012): 2577–87. https://doi.org/10.1002/sim.5328.

Demler OV, Pencina MJ, D’Agostino RB. Misuse of DeLong test to compare AUCs for nested models. Stat Med. 2012 Oct 15;31(23):2577–87.

Demler, Olga V., et al. “Misuse of DeLong test to compare AUCs for nested models.” Stat Med, vol. 31, no. 23, Oct. 2012, pp. 2577–87. Pubmed, doi:10.1002/sim.5328.

Demler OV, Pencina MJ, D’Agostino RB. Misuse of DeLong test to compare AUCs for nested models. Stat Med. 2012 Oct 15;31(23):2577–2587.

Published In

Stat Med

DOI

10.1002/sim.5328

EISSN

1097-0258

Publication Date

October 15, 2012

Volume

Issue

Start / End Page

2577 / 2587

Location

England

Related Subject Headings

Statistics & Probability
Risk Assessment
ROC Curve
Models, Statistical
Humans
Data Interpretation, Statistical
Coronary Disease
Computer Simulation
Area Under Curve
4905 Statistics