Scholars@Duke publication: A Path to Simpler Models Starts With Noise.

A Path to Simpler Models Starts With Noise.

Publication , Journal Article

Semenova, L; Chen, H; Parr, R; Rudin, C

Published in: Advances in neural information processing systems

December 2023

The Rashomon set is the set of models that perform approximately equally well on a given dataset, and the Rashomon ratio is the fraction of all models in a given hypothesis space that are in the Rashomon set. Rashomon ratios are often large for tabular datasets in criminal justice, healthcare, lending, education, and in other areas, which has practical implications about whether simpler models can attain the same level of accuracy as more complex models. An open question is why Rashomon ratios often tend to be large. In this work, we propose and study a mechanism of the data generation process, coupled with choices usually made by the analyst during the learning process, that determines the size of the Rashomon ratio. Specifically, we demonstrate that noisier datasets lead to larger Rashomon ratios through the way that practitioners train models. Additionally, we introduce a measure called pattern diversity, which captures the average difference in predictions between distinct classification patterns in the Rashomon set, and motivate why it tends to increase with label noise. Our results explain a key aspect of why simpler models often tend to perform as well as black box models on complex, noisier datasets.

Duke Scholars

Author Cynthia D. Rudin Computer Science

Author Ronald Parr Computer Science

Published In

Advances in neural information processing systems

ISSN

1049-5258

Publication Date

December 2023

Volume

Start / End Page

3362 / 3401

Related Subject Headings

4611 Machine learning
1702 Cognitive Sciences
1701 Psychology

Citation

APA

Chicago

ICMJE

MLA

NLM

Semenova, L., Chen, H., Parr, R., & Rudin, C. (2023). A Path to Simpler Models Starts With Noise. Advances in Neural Information Processing Systems, 36, 3362–3401.

Semenova, Lesia, Harry Chen, Ronald Parr, and Cynthia Rudin. “A Path to Simpler Models Starts With Noise.” Advances in Neural Information Processing Systems 36 (December 2023): 3362–3401.

Semenova L, Chen H, Parr R, Rudin C. A Path to Simpler Models Starts With Noise. Advances in neural information processing systems. 2023 Dec;36:3362–401.

Semenova, Lesia, et al. “A Path to Simpler Models Starts With Noise.” Advances in Neural Information Processing Systems, vol. 36, Dec. 2023, pp. 3362–401.

Semenova L, Chen H, Parr R, Rudin C. A Path to Simpler Models Starts With Noise. Advances in neural information processing systems. 2023 Dec;36:3362–3401.

Published In

Advances in neural information processing systems

ISSN

1049-5258

Publication Date

December 2023

Volume

Start / End Page

3362 / 3401

Related Subject Headings

4611 Machine learning
1702 Cognitive Sciences
1701 Psychology