Subjectivity in the Creation of Machine Learning Models
Transportation analysts are inundated with requests to apply popular machine learning modeling techniques to datasets to uncover never-before-seen relationships that could potentially revolutionize safety, congestion, and mobility. However, the results from such models can be influenced not just by biases in underlying data, but also through practitioner-induced biases. To demonstrate the significant number of subjective judgments made in the development and interpretation of machine learning models, we developed Logistic Regression and Neural Network models for transportation-focused datasets including those looking at driving injury/fatalities and pedestrian fatalities. We then developed five different representations of feature importance for each dataset, including different feature interpretations commonly used in the machine learning community. Twelve distinct judgments were highlighted in the development and interpretation of these models, which produced inconsistent results. Such inconsistencies can lead to very different interpretations of the results, which can lead to errors of commission and omission, with significant cost and safety implications if policies are erroneously adapted from such outcomes.
Duke Scholars
Published In
DOI
EISSN
ISSN
Publication Date
Volume
Issue
Related Subject Headings
- 4610 Library and information studies
- 4605 Data management and data science
- 08 Information and Computing Sciences
Citation
Published In
DOI
EISSN
ISSN
Publication Date
Volume
Issue
Related Subject Headings
- 4610 Library and information studies
- 4605 Data management and data science
- 08 Information and Computing Sciences