Scholars@Duke publication: EHRmonize: A Framework for Medical Concept Abstraction from Electronic Health Records using Large Language Models

EHRmonize: A Framework for Medical Concept Abstraction from Electronic Health Records using Large Language Models

Publication , Conference

Matos, J; Gallifant, J; Pei, J; Wong, AI

Published in: Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics

January 1, 2025

Published version (DOI)

Electronic health records (EHRs) contain vast amounts of complex data, but harmonizing and processing this information remains a challenging and costly task requiring significant clinical expertise. While large language models (LLMs) have shown promise in various healthcare applications, their potential for abstracting medical concepts from EHRs remains largely unexplored. We introduce EHRmonize, a framework leveraging LLMs to abstract medical concepts from EHR data. Our study uses medication data from two real-world EHR databases to evaluate five LLMs on two free-text extraction and six binary classification tasks across various prompting strategies. GPT-4o’s with 10-shot prompting achieved the highest performance in all tasks, accompanied by Claude-3.5-Sonnet in a subset of tasks. GPT-4o achieved an accuracy of 97% in identifying generic route names, 82% for generic drug names, and 100% in performing binary classification of antibiotics. While EHRmonize significantly enhances efficiency, reducing annotation time by an estimated 60%, we emphasize that clinician oversight remains essential. Our framework, available as a Python package, (Package on PyPI, Repository on GitHub, and Documentation on ReadTheDocs.) offers a promising tool to assist clinicians in EHR data abstraction. EHRmonize has the potential to accelerate healthcare research and improve data harmonization processes.

Duke Scholars

Author Jian Pei Computer Science

Published In

Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics

DOI

10.1007/978-3-031-82007-6_20

EISSN

1611-3349

ISSN

0302-9743

Publication Date

January 1, 2025

Volume

15384 LNCS

Start / End Page

210 / 220

Related Subject Headings

Artificial Intelligence & Image Processing
46 Information and computing sciences

Citation

APA

Chicago

ICMJE

MLA

NLM

Matos, J., Gallifant, J., Pei, J., & Wong, A. I. (2025). EHRmonize: A Framework for Medical Concept Abstraction from Electronic Health Records using Large Language Models. In Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics (Vol. 15384 LNCS, pp. 210–220). https://doi.org/10.1007/978-3-031-82007-6_20

Matos, J., J. Gallifant, J. Pei, and A. I. Wong. “EHRmonize: A Framework for Medical Concept Abstraction from Electronic Health Records using Large Language Models.” In Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 15384 LNCS:210–20, 2025. https://doi.org/10.1007/978-3-031-82007-6_20.

Matos J, Gallifant J, Pei J, Wong AI. EHRmonize: A Framework for Medical Concept Abstraction from Electronic Health Records using Large Language Models. In: Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics. 2025. p. 210–20.

Matos, J., et al. “EHRmonize: A Framework for Medical Concept Abstraction from Electronic Health Records using Large Language Models.” Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, vol. 15384 LNCS, 2025, pp. 210–20. Scopus, doi:10.1007/978-3-031-82007-6_20.

Matos J, Gallifant J, Pei J, Wong AI. EHRmonize: A Framework for Medical Concept Abstraction from Electronic Health Records using Large Language Models. Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics. 2025. p. 210–220.

Published In

Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics

DOI

10.1007/978-3-031-82007-6_20

EISSN

1611-3349

ISSN

0302-9743

Publication Date

January 1, 2025

Volume

15384 LNCS

Start / End Page

210 / 220

Related Subject Headings

Artificial Intelligence & Image Processing
46 Information and computing sciences