Evaluating Large Language Models for Automated Clinical Abstraction in Pulmonary Embolism Registries: Performance Across Model Sizes, Versions, and Parameters
Pulmonary embolism (PE) registries accelerate practice-improving research but depend on resource-intensive manual abstraction of radiology reports. We evaluated whether openly available large-language models (LLMs) can automate concept extraction from computed-tomography PE (CTPE) reports without sacrificing data quality. Four Llama-3 (L3) variants (3.0 8 B, 3.1 8 B, 3.1 70 B, 3.3 70 B) and two reviewer models Phi-4 (P4) 14 B and Gemma‑3 27 B (G3) were tested on 250 dual-annotated CTPE reports each from MIMIC-IV and Duke University. Outcomes were accuracy, positive predictive value (PPV), and negative predictive value (NPV) versus a human gold standard across model sizes, temperature settings, and shot counts. Mean accuracy across all concepts increased with scale: 0.83 (L3–0 8 B), 0.91 (L3–1 8 B), and 0.96 for both 70 B variants; P4 14 B achieved 0.98; G3 matched. Accuracy differed by < 0.03 between datasets, underscoring external robustness. In dual-model concordance analysis (L3 70 B + P4 14 B), PE-presence PPV was ≥ 0.95 and NPV ≥ 0.98, while location, thrombus burden, right-heart strain, and image-quality artifacts each maintained PPV ≥ 0.90 and NPV ≥ 0.95. Fewer than 4% of individual concept annotations were discordant, and complete agreement was observed in more than 75% of reports. G3 performed comparably. LLMs therefore offer a scalable, accurate solution for PE registry abstraction, and a dual-model review workflow can further safeguard data quality with minimal human oversight.
Duke Scholars
Published In
DOI
EISSN
ISSN
Publication Date
Volume
Start / End Page
Related Subject Headings
- Artificial Intelligence & Image Processing
- 46 Information and computing sciences
Citation
Published In
DOI
EISSN
ISSN
Publication Date
Volume
Start / End Page
Related Subject Headings
- Artificial Intelligence & Image Processing
- 46 Information and computing sciences