Subjectivity in the Creation of Machine Learning Models
Journal Article (Journal Article)
Transportation analysts are inundated with requests to apply popular machine learning modeling techniques to datasets to uncover never-before-seen relationships that could potentially revolutionize safety, congestion, and mobility. However, the results from such models can be influenced not just by biases in underlying data, but also through practitioner-induced biases. To demonstrate the significant number of subjective judgments made in the development and interpretation of machine learning models, we developed Logistic Regression and Neural Network models for transportation-focused datasets including those looking at driving injury/fatalities and pedestrian fatalities. We then developed five different representations of feature importance for each dataset, including different feature interpretations commonly used in the machine learning community. Twelve distinct judgments were highlighted in the development and interpretation of these models, which produced inconsistent results. Such inconsistencies can lead to very different interpretations of the results, which can lead to errors of commission and omission, with significant cost and safety implications if policies are erroneously adapted from such outcomes.
Full Text
Duke Authors
Cited Authors
- Cummings, ML; Li, S
Published Date
- May 13, 2021
Published In
Volume / Issue
- 13 / 2
Electronic International Standard Serial Number (EISSN)
- 1936-1963
International Standard Serial Number (ISSN)
- 1936-1955
Digital Object Identifier (DOI)
- 10.1145/3418034
Citation Source
- Scopus