Skip to main content

Using Open-Source Large Language Models to Identify Access to Germline Genetic Testing in Veterans With Breast Cancer From Unstructured Text.

Publication ,  Journal Article
Li, C; Stringer, M; Patil, V; Mcshinsky, R; Morreall, D; Yong, C; Rasmussen, KM; Burningham, Z; Tamang, S; Menendez, CS; Chiba, A; Moss, HA ...
Published in: JCO Clin Cancer Inform
July 2025

PURPOSE: The ability of large language models (LLMs) to identify access to germline genetic testing from unstructured text remains unknown. The Department of Veterans Affairs (VA) assessed access in Veterans with breast cancer by implementing and evaluating the performance of open-source, locally deployable LLMs (Llama 3 70B, Llama 3 8B, and Llama 2 70B) in identifying access from clinical/consult notes. METHODS: We identified a cohort of 1,201 Veterans diagnosed with breast cancer between January 1, 2021, and December 31, 2022, who received cancer care within the nationwide VA system and had clinical and/or consult notes available. Notes from a subset of 200 randomly selected patients, reviewed by subject-matter experts to identify access to testing, were split into development and testing sets, and various hyperparameters and prompting approaches were applied. We evaluated LLM performance using accuracy, precision, recall, and F1, with expert consensus on the labeled subset serving as ground truth. We compared LLM-identified access distribution in the entire cohort with expert-identified access in the labeled subset using the chi-squared test. RESULTS: Llama 3 70B achieved an F1 score of 0.912 (95% CI, 0.853 to 0.971), besting Llama 3 8B (F1: 0.811; 95% CI, 0.720 to 0.901) and significantly outperforming Llama 2 70B (F1: 0.644; 95% CI, 0.514 to 0.773; the test set target variable prevalence was 0.72.) We observed no significant difference between the performance of Llama 3 70B and that of the average individual expert reviewer, nor between LLM-identified access distribution across the entire cohort and expert-identified distribution in the labeled subset. CONCLUSION: An open-source, locally deployable LLM effectively and efficiently identified germline genetic testing access from clinical notes. LLMs may enhance care quality and efficiency, while safeguarding sensitive data.

Duke Scholars

Published In

JCO Clin Cancer Inform

DOI

EISSN

2473-4276

Publication Date

July 2025

Volume

9

Start / End Page

e2400263

Location

United States

Related Subject Headings

  • Veterans
  • United States
  • Middle Aged
  • Large Language Models
  • Humans
  • Germ-Line Mutation
  • Genetic Testing
  • Female
  • Breast Neoplasms
  • Aged
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Li, C., Stringer, M., Patil, V., Mcshinsky, R., Morreall, D., Yong, C., … Halwani, A. (2025). Using Open-Source Large Language Models to Identify Access to Germline Genetic Testing in Veterans With Breast Cancer From Unstructured Text. JCO Clin Cancer Inform, 9, e2400263. https://doi.org/10.1200/CCI-24-00263
Li, Chunyang, Michael Stringer, Vikas Patil, Richard Mcshinsky, Deborah Morreall, Christina Yong, Kelli M. Rasmussen, et al. “Using Open-Source Large Language Models to Identify Access to Germline Genetic Testing in Veterans With Breast Cancer From Unstructured Text.JCO Clin Cancer Inform 9 (July 2025): e2400263. https://doi.org/10.1200/CCI-24-00263.
Li C, Stringer M, Patil V, Mcshinsky R, Morreall D, Yong C, et al. Using Open-Source Large Language Models to Identify Access to Germline Genetic Testing in Veterans With Breast Cancer From Unstructured Text. JCO Clin Cancer Inform. 2025 Jul;9:e2400263.
Li, Chunyang, et al. “Using Open-Source Large Language Models to Identify Access to Germline Genetic Testing in Veterans With Breast Cancer From Unstructured Text.JCO Clin Cancer Inform, vol. 9, July 2025, p. e2400263. Pubmed, doi:10.1200/CCI-24-00263.
Li C, Stringer M, Patil V, Mcshinsky R, Morreall D, Yong C, Rasmussen KM, Burningham Z, Tamang S, Menendez CS, Chiba A, Moss HA, Colonna S, Rowe K, Friedman D, Kelley MJ, Halwani A. Using Open-Source Large Language Models to Identify Access to Germline Genetic Testing in Veterans With Breast Cancer From Unstructured Text. JCO Clin Cancer Inform. 2025 Jul;9:e2400263.

Published In

JCO Clin Cancer Inform

DOI

EISSN

2473-4276

Publication Date

July 2025

Volume

9

Start / End Page

e2400263

Location

United States

Related Subject Headings

  • Veterans
  • United States
  • Middle Aged
  • Large Language Models
  • Humans
  • Germ-Line Mutation
  • Genetic Testing
  • Female
  • Breast Neoplasms
  • Aged