Skip to main content
Journal cover image

Integrated analysis for electronic health records with structured and sporadic missingness.

Publication ,  Journal Article
Tan, J; Zhang, Y; Hong, C; Cai, TT; Cai, T; Zhang, AR
Published in: J Biomed Inform
November 2025

OBJECTIVES: We propose a novel imputation method tailored for Electronic Health Records (EHRs) with structured and sporadic missingness. Such missingness frequently arises in the integration of heterogeneous EHR datasets for downstream clinical applications. By addressing these gaps, our method provides a practical solution for integrated analysis, enhancing data utility and advancing the understanding of population health. MATERIALS AND METHODS: We begin by demonstrating structured and sporadic missing mechanisms in the integrated analysis of EHR data. Following this, we introduce a novel imputation framework, Macomss, specifically designed to handle structurally and heterogeneously occurring missing data. We establish theoretical guarantees for Macomss, ensuring its robustness in preserving the integrity and reliability of integrated analyses. To assess its empirical performance, we conduct extensive simulation studies that replicate the complex missingness patterns observed in real-world EHR systems, complemented by validation using EHR datasets from the Duke University Health System (DUHS). RESULTS: Simulation studies show that our approach consistently outperforms existing imputation methods. Using datasets from three hospitals within DUHS, Macomss achieves the lowest imputation errors for missing data in most cases and provides superior or comparable downstream prediction performance compared to benchmark methods. DISCUSSION: The proposed method effectively addresses critical missingness patterns that arise in the integrated analysis of EHR datasets, enhancing the robustness and generalizability of clinical predictions. CONCLUSIONS: We provide a theoretically guaranteed and practically meaningful method for imputing structured and sporadic missing data, enabling accurate and reliable integrated analysis across multiple EHR datasets. The proposed approach holds significant potential for advancing research in population health.

Duke Scholars

Published In

J Biomed Inform

DOI

EISSN

1532-0480

Publication Date

November 2025

Volume

171

Start / End Page

104933

Location

United States

Related Subject Headings

  • Reproducibility of Results
  • Medical Informatics
  • Medical Informatics
  • Humans
  • Electronic Health Records
  • Computer Simulation
  • Biomedical Engineering
  • Algorithms
  • 4601 Applied computing
  • 4203 Health services and systems
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Tan, J., Zhang, Y., Hong, C., Cai, T. T., Cai, T., & Zhang, A. R. (2025). Integrated analysis for electronic health records with structured and sporadic missingness. J Biomed Inform, 171, 104933. https://doi.org/10.1016/j.jbi.2025.104933
Tan, Jianbin, Yan Zhang, Chuan Hong, T Tony Cai, Tianxi Cai, and Anru R. Zhang. “Integrated analysis for electronic health records with structured and sporadic missingness.J Biomed Inform 171 (November 2025): 104933. https://doi.org/10.1016/j.jbi.2025.104933.
Tan J, Zhang Y, Hong C, Cai TT, Cai T, Zhang AR. Integrated analysis for electronic health records with structured and sporadic missingness. J Biomed Inform. 2025 Nov;171:104933.
Tan, Jianbin, et al. “Integrated analysis for electronic health records with structured and sporadic missingness.J Biomed Inform, vol. 171, Nov. 2025, p. 104933. Pubmed, doi:10.1016/j.jbi.2025.104933.
Tan J, Zhang Y, Hong C, Cai TT, Cai T, Zhang AR. Integrated analysis for electronic health records with structured and sporadic missingness. J Biomed Inform. 2025 Nov;171:104933.
Journal cover image

Published In

J Biomed Inform

DOI

EISSN

1532-0480

Publication Date

November 2025

Volume

171

Start / End Page

104933

Location

United States

Related Subject Headings

  • Reproducibility of Results
  • Medical Informatics
  • Medical Informatics
  • Humans
  • Electronic Health Records
  • Computer Simulation
  • Biomedical Engineering
  • Algorithms
  • 4601 Applied computing
  • 4203 Health services and systems