Skip to main content

Retrieval augmented generation for 10 large language models and its generalizability in assessing medical fitness.

Publication ,  Journal Article
Ke, YH; Jin, L; Elangovan, K; Abdullah, HR; Liu, N; Sia, ATH; Soh, CR; Tung, JYM; Ong, JCL; Kuo, C-F; Wu, S-C; Kovacheva, VP; Ting, DSW
Published in: NPJ Digit Med
April 5, 2025

Large Language Models (LLMs) hold promise for medical applications but often lack domain-specific expertise. Retrieval Augmented Generation (RAG) enables customization by integrating specialized knowledge. This study assessed the accuracy, consistency, and safety of LLM-RAG models in determining surgical fitness and delivering preoperative instructions using 35 local and 23 international guidelines. Ten LLMs (e.g., GPT3.5, GPT4, GPT4o, Gemini, Llama2, and Llama3, Claude) were tested across 14 clinical scenarios. A total of 3234 responses were generated and compared to 448 human-generated answers. The GPT4 LLM-RAG model with international guidelines generated answers within 20 s and achieved the highest accuracy, which was significantly better than human-generated responses (96.4% vs. 86.6%, p = 0.016). Additionally, the model exhibited an absence of hallucinations and produced more consistent output than humans. This study underscores the potential of GPT-4-based LLM-RAG models to deliver highly accurate, efficient, and consistent preoperative assessments.

Duke Scholars

Published In

NPJ Digit Med

DOI

EISSN

2398-6352

Publication Date

April 5, 2025

Volume

8

Issue

1

Start / End Page

187

Location

England

Related Subject Headings

  • 4203 Health services and systems
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Ke, Y. H., Jin, L., Elangovan, K., Abdullah, H. R., Liu, N., Sia, A. T. H., … Ting, D. S. W. (2025). Retrieval augmented generation for 10 large language models and its generalizability in assessing medical fitness. NPJ Digit Med, 8(1), 187. https://doi.org/10.1038/s41746-025-01519-z
Ke, Yu He, Liyuan Jin, Kabilan Elangovan, Hairil Rizal Abdullah, Nan Liu, Alex Tiong Heng Sia, Chai Rick Soh, et al. “Retrieval augmented generation for 10 large language models and its generalizability in assessing medical fitness.NPJ Digit Med 8, no. 1 (April 5, 2025): 187. https://doi.org/10.1038/s41746-025-01519-z.
Ke YH, Jin L, Elangovan K, Abdullah HR, Liu N, Sia ATH, et al. Retrieval augmented generation for 10 large language models and its generalizability in assessing medical fitness. NPJ Digit Med. 2025 Apr 5;8(1):187.
Ke, Yu He, et al. “Retrieval augmented generation for 10 large language models and its generalizability in assessing medical fitness.NPJ Digit Med, vol. 8, no. 1, Apr. 2025, p. 187. Pubmed, doi:10.1038/s41746-025-01519-z.
Ke YH, Jin L, Elangovan K, Abdullah HR, Liu N, Sia ATH, Soh CR, Tung JYM, Ong JCL, Kuo C-F, Wu S-C, Kovacheva VP, Ting DSW. Retrieval augmented generation for 10 large language models and its generalizability in assessing medical fitness. NPJ Digit Med. 2025 Apr 5;8(1):187.

Published In

NPJ Digit Med

DOI

EISSN

2398-6352

Publication Date

April 5, 2025

Volume

8

Issue

1

Start / End Page

187

Location

England

Related Subject Headings

  • 4203 Health services and systems