A longitudinal analysis of data quality in a large pediatric data research network.

Published

Journal Article

PEDSnet is a clinical data research network (CDRN) that aggregates electronic health record data from multiple children's hospitals to enable large-scale research. Assessing data quality to ensure suitability for conducting research is a key requirement in PEDSnet. This study presents a range of data quality issues identified over a period of 18 months and interprets them to evaluate the research capacity of PEDSnet.Results were generated by a semiautomated data quality assessment workflow. Two investigators reviewed programmatic data quality issues and conducted discussions with the data partners' extract-transform-load analysts to determine the cause for each issue.The results include a longitudinal summary of 2182 data quality issues identified across 9 data submission cycles. The metadata from the most recent cycle includes annotations for 850 issues: most frequent types, including missing data (>300) and outliers (>100); most complex domains, including medications (>160) and lab measurements (>140); and primary causes, including source data characteristics (83%) and extract-transform-load errors (9%).The longitudinal findings demonstrate the network's evolution from identifying difficulties with aligning the data to a common data model to learning norms in clinical pediatrics and determining research capability.While data quality is recognized as a critical aspect in establishing and utilizing a CDRN, the findings from data quality assessments are largely unpublished. This paper presents a real-world account of studying and interpreting data quality findings in a pediatric CDRN, and the lessons learned could be used by other CDRNs.

Full Text

Duke Authors

Cited Authors

  • Khare, R; Utidjian, L; Ruth, BJ; Kahn, MG; Burrows, E; Marsolo, K; Patibandla, N; Razzaghi, H; Colvin, R; Ranade, D; Kitzmiller, M; Eckrich, D; Bailey, LC

Published Date

  • November 2017

Published In

Volume / Issue

  • 24 / 6

Start / End Page

  • 1072 - 1079

PubMed ID

  • 28398525

Pubmed Central ID

  • 28398525

Electronic International Standard Serial Number (EISSN)

  • 1527-974X

International Standard Serial Number (ISSN)

  • 1067-5027

Digital Object Identifier (DOI)

  • 10.1093/jamia/ocx033

Language

  • eng