Skip to main content

Evaluating Large Language Models for Automated Clinical Abstraction in Pulmonary Embolism Registries: Performance Across Model Sizes, Versions, and Parameters

Publication ,  Conference
Alwakeel, M; Buck, E; Martin, JG; Aslam, I; Rajagopal, S; Pei, J; Podgoreanu, MV; Lindsell, CJ; Wong, AKI
Published in: Lecture Notes in Computer Science
January 1, 2026

Pulmonary embolism (PE) registries accelerate practice-improving research but depend on resource-intensive manual abstraction of radiology reports. We evaluated whether openly available large-language models (LLMs) can automate concept extraction from computed-tomography PE (CTPE) reports without sacrificing data quality. Four Llama-3 (L3) variants (3.0 8 B, 3.1 8 B, 3.1 70 B, 3.3 70 B) and two reviewer models Phi-4 (P4) 14 B and Gemma‑3 27 B (G3) were tested on 250 dual-annotated CTPE reports each from MIMIC-IV and Duke University. Outcomes were accuracy, positive predictive value (PPV), and negative predictive value (NPV) versus a human gold standard across model sizes, temperature settings, and shot counts. Mean accuracy across all concepts increased with scale: 0.83 (L3–0 8 B), 0.91 (L3–1 8 B), and 0.96 for both 70 B variants; P4 14 B achieved 0.98; G3 matched. Accuracy differed by < 0.03 between datasets, underscoring external robustness. In dual-model concordance analysis (L3 70 B + P4 14 B), PE-presence PPV was ≥ 0.95 and NPV ≥ 0.98, while location, thrombus burden, right-heart strain, and image-quality artifacts each maintained PPV ≥ 0.90 and NPV ≥ 0.95. Fewer than 4% of individual concept annotations were discordant, and complete agreement was observed in more than 75% of reports. G3 performed comparably. LLMs therefore offer a scalable, accurate solution for PE registry abstraction, and a dual-model review workflow can further safeguard data quality with minimal human oversight.

Duke Scholars

Published In

Lecture Notes in Computer Science

DOI

EISSN

1611-3349

ISSN

0302-9743

Publication Date

January 1, 2026

Volume

16206 LNCS

Start / End Page

206 / 215

Related Subject Headings

  • Artificial Intelligence & Image Processing
  • 46 Information and computing sciences
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Alwakeel, M., Buck, E., Martin, J. G., Aslam, I., Rajagopal, S., Pei, J., … Wong, A. K. I. (2026). Evaluating Large Language Models for Automated Clinical Abstraction in Pulmonary Embolism Registries: Performance Across Model Sizes, Versions, and Parameters. In Lecture Notes in Computer Science (Vol. 16206 LNCS, pp. 206–215). https://doi.org/10.1007/978-3-032-09569-5_21
Alwakeel, M., E. Buck, J. G. Martin, I. Aslam, S. Rajagopal, J. Pei, M. V. Podgoreanu, C. J. Lindsell, and A. K. I. Wong. “Evaluating Large Language Models for Automated Clinical Abstraction in Pulmonary Embolism Registries: Performance Across Model Sizes, Versions, and Parameters.” In Lecture Notes in Computer Science, 16206 LNCS:206–15, 2026. https://doi.org/10.1007/978-3-032-09569-5_21.
Alwakeel M, Buck E, Martin JG, Aslam I, Rajagopal S, Pei J, et al. Evaluating Large Language Models for Automated Clinical Abstraction in Pulmonary Embolism Registries: Performance Across Model Sizes, Versions, and Parameters. In: Lecture Notes in Computer Science. 2026. p. 206–15.
Alwakeel, M., et al. “Evaluating Large Language Models for Automated Clinical Abstraction in Pulmonary Embolism Registries: Performance Across Model Sizes, Versions, and Parameters.” Lecture Notes in Computer Science, vol. 16206 LNCS, 2026, pp. 206–15. Scopus, doi:10.1007/978-3-032-09569-5_21.
Alwakeel M, Buck E, Martin JG, Aslam I, Rajagopal S, Pei J, Podgoreanu MV, Lindsell CJ, Wong AKI. Evaluating Large Language Models for Automated Clinical Abstraction in Pulmonary Embolism Registries: Performance Across Model Sizes, Versions, and Parameters. Lecture Notes in Computer Science. 2026. p. 206–215.

Published In

Lecture Notes in Computer Science

DOI

EISSN

1611-3349

ISSN

0302-9743

Publication Date

January 1, 2026

Volume

16206 LNCS

Start / End Page

206 / 215

Related Subject Headings

  • Artificial Intelligence & Image Processing
  • 46 Information and computing sciences