The challenges of integrating diverse data sources: A case study in major depression
Combining data from diverse sources including randomized controlled trials (RCTs) and observational datasets holds the potential to increase sample size, improve external validity, and gain a well-rounded view of the question under study. However, the practical implementation of integrating different data sources can be complicated, particularly when considering data collected across sites and institutions. In this paper, we use a case study of data from four RCTs and two electronic health record (EHR) systems to illustrate some of the challenges that can arise when combining these various sources of data. We group the challenges into cohort- and variable-related challenges, and for each challenge, we provide descriptive statistics and visuals from our case study to show the decisions that must be made and the subsequent implications. We provide guidance for researchers on the most important considerations and emphasize the necessity for careful, documented decision-making done through an interdisciplinary team. Through this case study and associated reflections, we highlight the dangers of naively combining data and advocate for a discussion and clear communication of the decisions made at each step in the data combination process, as well as the limitations and implications of those decisions.
Duke Scholars
Published In
DOI
EISSN
ISSN
Publication Date
Related Subject Headings
- Health Policy & Services
- 35 Commerce, management, tourism and services
- 15 Commerce, Management, Tourism and Services
Citation
Published In
DOI
EISSN
ISSN
Publication Date
Related Subject Headings
- Health Policy & Services
- 35 Commerce, management, tourism and services
- 15 Commerce, Management, Tourism and Services