Skip to main content
Journal cover image

Large Language Models for Diagnosing Focal Liver Lesions From CT/MRI Reports: A Comparative Study With Radiologists.

Publication ,  Journal Article
Sheng, L; Chen, Y; Wei, H; Che, F; Wu, Y; Qin, Q; Yang, C; Wang, Y; Peng, J; Bashir, MR; Ronot, M; Song, B; Jiang, H
Published in: Liver Int
June 2025

BACKGROUND & AIMS: Whether large language models (LLMs) could be integrated into the diagnostic workflow of focal liver lesions (FLLs) remains unclear. We aimed to investigate two generic LLMs (ChatGPT-4o and Gemini) regarding their diagnostic accuracies referring to the CT/MRI reports, compared to and combined with radiologists of different experience levels. METHODS: From April 2022 to April 2024, this single-center retrospective study included consecutive adult patients who underwent contrast-enhanced CT/MRI for single FLL and subsequent histopathologic examination. The LLMs were prompted by clinical information and the "findings" section of radiology reports three times to provide differential diagnoses in the descending order of likelihood, with the first considered the final diagnosis. In the research setting, six radiologists (three junior and three middle-level) independently reviewed the CT/MRI images and clinical information in two rounds (first alone, then with LLM assistance). In the clinical setting, diagnoses were retrieved from the "impressions" section of radiology reports. Diagnostic accuracy was investigated against histopathology. RESULTS: 228 patients (median age, 59 years; 155 males) with 228 FLLs (median size, 3.6 cm) were included. Regarding the final diagnosis, the accuracy of two-step ChatGPT-4o (78.9%) was higher than single-step ChatGPT-4o (68.0%, p < 0.001) and single-step Gemini (73.2%, p = 0.004), similar to real-world radiology reports (80.0%, p = 0.34) and junior radiologists (78.9%-82.0%; p-values, 0.21 to > 0.99), but lower than middle-level radiologists (84.6%-85.5%; p-values, 0.001 to 0.02). No incremental diagnostic value of ChatGPT-4o was observed for any radiologist (p-values, 0.63 to > 0.99). CONCLUSION: Two-step ChatGPT-4o showed matching accuracies to real-world radiology reports and junior radiologists for diagnosing FLLs but was less accurate than middle-level radiologists and demonstrated little incremental diagnostic value.

Duke Scholars

Published In

Liver Int

DOI

EISSN

1478-3231

Publication Date

June 2025

Volume

45

Issue

6

Start / End Page

e70115

Location

United States

Related Subject Headings

  • Tomography, X-Ray Computed
  • Retrospective Studies
  • Radiologists
  • Middle Aged
  • Male
  • Magnetic Resonance Imaging
  • Liver Neoplasms
  • Liver
  • Large Language Models
  • Humans
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Sheng, L., Chen, Y., Wei, H., Che, F., Wu, Y., Qin, Q., … Jiang, H. (2025). Large Language Models for Diagnosing Focal Liver Lesions From CT/MRI Reports: A Comparative Study With Radiologists. Liver Int, 45(6), e70115. https://doi.org/10.1111/liv.70115
Sheng, Liuji, Yidi Chen, Hong Wei, Feng Che, Yingyi Wu, Qin Qin, Chongtu Yang, et al. “Large Language Models for Diagnosing Focal Liver Lesions From CT/MRI Reports: A Comparative Study With Radiologists.Liver Int 45, no. 6 (June 2025): e70115. https://doi.org/10.1111/liv.70115.
Sheng L, Chen Y, Wei H, Che F, Wu Y, Qin Q, et al. Large Language Models for Diagnosing Focal Liver Lesions From CT/MRI Reports: A Comparative Study With Radiologists. Liver Int. 2025 Jun;45(6):e70115.
Sheng, Liuji, et al. “Large Language Models for Diagnosing Focal Liver Lesions From CT/MRI Reports: A Comparative Study With Radiologists.Liver Int, vol. 45, no. 6, June 2025, p. e70115. Pubmed, doi:10.1111/liv.70115.
Sheng L, Chen Y, Wei H, Che F, Wu Y, Qin Q, Yang C, Wang Y, Peng J, Bashir MR, Ronot M, Song B, Jiang H. Large Language Models for Diagnosing Focal Liver Lesions From CT/MRI Reports: A Comparative Study With Radiologists. Liver Int. 2025 Jun;45(6):e70115.
Journal cover image

Published In

Liver Int

DOI

EISSN

1478-3231

Publication Date

June 2025

Volume

45

Issue

6

Start / End Page

e70115

Location

United States

Related Subject Headings

  • Tomography, X-Ray Computed
  • Retrospective Studies
  • Radiologists
  • Middle Aged
  • Male
  • Magnetic Resonance Imaging
  • Liver Neoplasms
  • Liver
  • Large Language Models
  • Humans