Subjectivity in the Creation of Machine Learning Models

Journal Article (Journal Article)

Transportation analysts are inundated with requests to apply popular machine learning modeling techniques to datasets to uncover never-before-seen relationships that could potentially revolutionize safety, congestion, and mobility. However, the results from such models can be influenced not just by biases in underlying data, but also through practitioner-induced biases. To demonstrate the significant number of subjective judgments made in the development and interpretation of machine learning models, we developed Logistic Regression and Neural Network models for transportation-focused datasets including those looking at driving injury/fatalities and pedestrian fatalities. We then developed five different representations of feature importance for each dataset, including different feature interpretations commonly used in the machine learning community. Twelve distinct judgments were highlighted in the development and interpretation of these models, which produced inconsistent results. Such inconsistencies can lead to very different interpretations of the results, which can lead to errors of commission and omission, with significant cost and safety implications if policies are erroneously adapted from such outcomes.

Full Text

Duke Authors

Cited Authors

  • Cummings, ML; Li, S

Published Date

  • May 13, 2021

Published In

Volume / Issue

  • 13 / 2

Electronic International Standard Serial Number (EISSN)

  • 1936-1963

International Standard Serial Number (ISSN)

  • 1936-1955

Digital Object Identifier (DOI)

  • 10.1145/3418034

Citation Source

  • Scopus