Skip to main content
Journal cover image

The Trials and Tribulations of Assembling Large Medical Imaging Datasets for Machine Learning Applications.

Publication ,  Journal Article
Magudia, K; Bridge, CP; Andriole, KP; Rosenthal, MH
Published in: J Digit Imaging
December 2021

With vast interest in machine learning applications, more investigators are proposing to assemble large datasets for machine learning applications. We aim to delineate multiple possible roadblocks to exam retrieval that may present themselves and lead to significant time delays. This HIPAA-compliant, institutional review board-approved, retrospective clinical study required identification and retrieval of all outpatient and emergency patients undergoing abdominal and pelvic computed tomography (CT) at three affiliated hospitals in the year 2012. If a patient had multiple abdominal CT exams, the first exam was selected for retrieval (n=23,186). Our experience in attempting to retrieve 23,186 abdominal CT exams yielded 22,852 valid CT abdomen/pelvis exams and identified four major categories of challenges when retrieving large datasets: cohort selection and processing, retrieving DICOM exam files from PACS, data storage, and non-recoverable failures. The retrieval took 3 months of project time and at minimum 300 person-hours of time between the primary investigator (a radiologist), a data scientist, and a software engineer. Exam selection and retrieval may take significantly longer than planned. We share our experience so that other investigators can anticipate and plan for these challenges. We also hope to help institutions better understand the demands that may be placed on their infrastructure by large-scale medical imaging machine learning projects.

Duke Scholars

Altmetric Attention Stats
Dimensions Citation Stats

Published In

J Digit Imaging

DOI

EISSN

1618-727X

Publication Date

December 2021

Volume

34

Issue

6

Start / End Page

1424 / 1429

Location

United States

Related Subject Headings

  • Tomography, X-Ray Computed
  • Retrospective Studies
  • Radiography
  • Nuclear Medicine & Medical Imaging
  • Machine Learning
  • Humans
  • Abdomen
  • 3202 Clinical sciences
  • 1103 Clinical Sciences
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Magudia, K., Bridge, C. P., Andriole, K. P., & Rosenthal, M. H. (2021). The Trials and Tribulations of Assembling Large Medical Imaging Datasets for Machine Learning Applications. J Digit Imaging, 34(6), 1424–1429. https://doi.org/10.1007/s10278-021-00505-7
Magudia, Kirti, Christopher P. Bridge, Katherine P. Andriole, and Michael H. Rosenthal. “The Trials and Tribulations of Assembling Large Medical Imaging Datasets for Machine Learning Applications.J Digit Imaging 34, no. 6 (December 2021): 1424–29. https://doi.org/10.1007/s10278-021-00505-7.
Magudia K, Bridge CP, Andriole KP, Rosenthal MH. The Trials and Tribulations of Assembling Large Medical Imaging Datasets for Machine Learning Applications. J Digit Imaging. 2021 Dec;34(6):1424–9.
Magudia, Kirti, et al. “The Trials and Tribulations of Assembling Large Medical Imaging Datasets for Machine Learning Applications.J Digit Imaging, vol. 34, no. 6, Dec. 2021, pp. 1424–29. Pubmed, doi:10.1007/s10278-021-00505-7.
Magudia K, Bridge CP, Andriole KP, Rosenthal MH. The Trials and Tribulations of Assembling Large Medical Imaging Datasets for Machine Learning Applications. J Digit Imaging. 2021 Dec;34(6):1424–1429.
Journal cover image

Published In

J Digit Imaging

DOI

EISSN

1618-727X

Publication Date

December 2021

Volume

34

Issue

6

Start / End Page

1424 / 1429

Location

United States

Related Subject Headings

  • Tomography, X-Ray Computed
  • Retrospective Studies
  • Radiography
  • Nuclear Medicine & Medical Imaging
  • Machine Learning
  • Humans
  • Abdomen
  • 3202 Clinical sciences
  • 1103 Clinical Sciences