How and when informative visit processes can bias inference when using electronic health records data for clinical research.
OBJECTIVE: Electronic health records (EHR) data have become a central data source for clinical research. One concern for using EHR data is that the process through which individuals engage with the health system, and find themselves within EHR data, can be informative. We have termed this process informed presence. In this study we use simulation and real data to assess how the informed presence can impact inference. MATERIALS AND METHODS: We first simulated a visit process where a series of biomarkers were observed informatively and uninformatively over time. We further compared inference derived from a randomized control trial (ie, uninformative visits) and EHR data (ie, potentially informative visits). RESULTS: We find that only when there is both a strong association between the biomarker and the outcome as well as the biomarker and the visit process is there bias. Moreover, once there are some uninformative visits this bias is mitigated. In the data example we find, that when the "true" associations are null, there is no observed bias. DISCUSSION: These results suggest that an informative visit process can exaggerate an association but cannot induce one. Furthermore, careful study design can, mitigate the potential bias when some noninformative visits are included. CONCLUSIONS: While there are legitimate concerns regarding biases that "messy" EHR data may induce, the conditions for such biases are extreme and can be accounted for.
Goldstein, BA; Phelan, M; Pagidipati, NJ; Peskoe, SB
Volume / Issue
Start / End Page
Pubmed Central ID
Electronic International Standard Serial Number (EISSN)
Digital Object Identifier (DOI)