Skip to main content

All Models are Wrong, but Many are Useful: Learning a Variable's Importance by Studying an Entire Class of Prediction Models Simultaneously.

Publication ,  Journal Article
Fisher, A; Rudin, C; Dominici, F
Published in: Journal of machine learning research : JMLR
January 2019

Variable importance (VI) tools describe how much covariates contribute to a prediction model's accuracy. However, important variables for one well-performing model (for example, a linear model f (x) = x Tβ with a fixed coefficient vector β) may be unimportant for another model. In this paper, we propose model class reliance (MCR) as the range of VI values across all well-performing model in a prespecified class. Thus, MCR gives a more comprehensive description of importance by accounting for the fact that many prediction models, possibly of different parametric forms, may fit the data well. In the process of deriving MCR, we show several informative results for permutation-based VI estimates, based on the VI measures used in Random Forests. Specifically, we derive connections between permutation importance estimates for a single prediction model, U-statistics, conditional variable importance, conditional causal effects, and linear model coefficients. We then give probabilistic bounds for MCR, using a novel, generalizable technique. We apply MCR to a public data set of Broward County criminal records to study the reliance of recidivism prediction models on sex and race. In this application, MCR can be used to help inform VI for unknown, proprietary models.

Duke Scholars

Published In

Journal of machine learning research : JMLR

EISSN

1533-7928

ISSN

1532-4435

Publication Date

January 2019

Volume

20

Start / End Page

177

Related Subject Headings

  • Artificial Intelligence & Image Processing
  • 4905 Statistics
  • 4611 Machine learning
  • 17 Psychology and Cognitive Sciences
  • 08 Information and Computing Sciences
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Fisher, A., Rudin, C., & Dominici, F. (2019). All Models are Wrong, but Many are Useful: Learning a Variable's Importance by Studying an Entire Class of Prediction Models Simultaneously. Journal of Machine Learning Research : JMLR, 20, 177.
Fisher, Aaron, Cynthia Rudin, and Francesca Dominici. “All Models are Wrong, but Many are Useful: Learning a Variable's Importance by Studying an Entire Class of Prediction Models Simultaneously.Journal of Machine Learning Research : JMLR 20 (January 2019): 177.
Fisher A, Rudin C, Dominici F. All Models are Wrong, but Many are Useful: Learning a Variable's Importance by Studying an Entire Class of Prediction Models Simultaneously. Journal of machine learning research : JMLR. 2019 Jan;20:177.
Fisher, Aaron, et al. “All Models are Wrong, but Many are Useful: Learning a Variable's Importance by Studying an Entire Class of Prediction Models Simultaneously.Journal of Machine Learning Research : JMLR, vol. 20, Jan. 2019, p. 177.
Fisher A, Rudin C, Dominici F. All Models are Wrong, but Many are Useful: Learning a Variable's Importance by Studying an Entire Class of Prediction Models Simultaneously. Journal of machine learning research : JMLR. 2019 Jan;20:177.

Published In

Journal of machine learning research : JMLR

EISSN

1533-7928

ISSN

1532-4435

Publication Date

January 2019

Volume

20

Start / End Page

177

Related Subject Headings

  • Artificial Intelligence & Image Processing
  • 4905 Statistics
  • 4611 Machine learning
  • 17 Psychology and Cognitive Sciences
  • 08 Information and Computing Sciences