Skip to main content
Journal cover image

A longitudinal analysis of data quality in a large pediatric data research network.

Publication ,  Journal Article
Khare, R; Utidjian, L; Ruth, BJ; Kahn, MG; Burrows, E; Marsolo, K; Patibandla, N; Razzaghi, H; Colvin, R; Ranade, D; Kitzmiller, M; Eckrich, D ...
Published in: J Am Med Inform Assoc
November 1, 2017

OBJECTIVE: PEDSnet is a clinical data research network (CDRN) that aggregates electronic health record data from multiple children's hospitals to enable large-scale research. Assessing data quality to ensure suitability for conducting research is a key requirement in PEDSnet. This study presents a range of data quality issues identified over a period of 18 months and interprets them to evaluate the research capacity of PEDSnet. MATERIALS AND METHODS: Results were generated by a semiautomated data quality assessment workflow. Two investigators reviewed programmatic data quality issues and conducted discussions with the data partners' extract-transform-load analysts to determine the cause for each issue. RESULTS: The results include a longitudinal summary of 2182 data quality issues identified across 9 data submission cycles. The metadata from the most recent cycle includes annotations for 850 issues: most frequent types, including missing data (>300) and outliers (>100); most complex domains, including medications (>160) and lab measurements (>140); and primary causes, including source data characteristics (83%) and extract-transform-load errors (9%). DISCUSSION: The longitudinal findings demonstrate the network's evolution from identifying difficulties with aligning the data to a common data model to learning norms in clinical pediatrics and determining research capability. CONCLUSION: While data quality is recognized as a critical aspect in establishing and utilizing a CDRN, the findings from data quality assessments are largely unpublished. This paper presents a real-world account of studying and interpreting data quality findings in a pediatric CDRN, and the lessons learned could be used by other CDRNs.

Duke Scholars

Altmetric Attention Stats
Dimensions Citation Stats

Published In

J Am Med Inform Assoc

DOI

EISSN

1527-974X

Publication Date

November 1, 2017

Volume

24

Issue

6

Start / End Page

1072 / 1079

Location

England

Related Subject Headings

  • Medical Informatics
  • Longitudinal Studies
  • Hospitals, Pediatric
  • Electronic Health Records
  • Datasets as Topic
  • Data Accuracy
  • Biomedical Research
  • 46 Information and computing sciences
  • 42 Health sciences
  • 32 Biomedical and clinical sciences
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Khare, R., Utidjian, L., Ruth, B. J., Kahn, M. G., Burrows, E., Marsolo, K., … Bailey, L. C. (2017). A longitudinal analysis of data quality in a large pediatric data research network. J Am Med Inform Assoc, 24(6), 1072–1079. https://doi.org/10.1093/jamia/ocx033
Khare, Ritu, Levon Utidjian, Byron J. Ruth, Michael G. Kahn, Evanette Burrows, Keith Marsolo, Nandan Patibandla, et al. “A longitudinal analysis of data quality in a large pediatric data research network.J Am Med Inform Assoc 24, no. 6 (November 1, 2017): 1072–79. https://doi.org/10.1093/jamia/ocx033.
Khare R, Utidjian L, Ruth BJ, Kahn MG, Burrows E, Marsolo K, et al. A longitudinal analysis of data quality in a large pediatric data research network. J Am Med Inform Assoc. 2017 Nov 1;24(6):1072–9.
Khare, Ritu, et al. “A longitudinal analysis of data quality in a large pediatric data research network.J Am Med Inform Assoc, vol. 24, no. 6, Nov. 2017, pp. 1072–79. Pubmed, doi:10.1093/jamia/ocx033.
Khare R, Utidjian L, Ruth BJ, Kahn MG, Burrows E, Marsolo K, Patibandla N, Razzaghi H, Colvin R, Ranade D, Kitzmiller M, Eckrich D, Bailey LC. A longitudinal analysis of data quality in a large pediatric data research network. J Am Med Inform Assoc. 2017 Nov 1;24(6):1072–1079.
Journal cover image

Published In

J Am Med Inform Assoc

DOI

EISSN

1527-974X

Publication Date

November 1, 2017

Volume

24

Issue

6

Start / End Page

1072 / 1079

Location

England

Related Subject Headings

  • Medical Informatics
  • Longitudinal Studies
  • Hospitals, Pediatric
  • Electronic Health Records
  • Datasets as Topic
  • Data Accuracy
  • Biomedical Research
  • 46 Information and computing sciences
  • 42 Health sciences
  • 32 Biomedical and clinical sciences