Scholars@Duke publication: Contextual Phenotyping of Pediatric Sepsis Cohort Using Large Language Models.

Contextual Phenotyping of Pediatric Sepsis Cohort Using Large Language Models.

Publication , Journal Article

Nagori, A; Gautam, A; Wiens, MO; Nguyen, V; Mugisha, NK; Kabakyenga, J; Kissoon, N; Ansermino, JM; Kamaleswaran, R

Published in: AMIA Annu Symp Proc

2024

The clustering of patient subgroups is essential for personalized care and efficient use of resources. Traditional clustering methods struggle with high-dimensional heterogeneous healthcare data and lack contextual understanding. This study evaluates clustering based on the Large Language Model (LLM) against classical methods using a pediatric sepsis dataset from a low-income country (LIC), containing 2,686 records with 28 numerical variables and 119 categorical variables. Patient records were serialized into text with and without a clustering objective. Embeddings were generated using quantized LLAMA 3.1 8B, DeepSeek-R1-Distill-Llama-8B with low-rank adaptation(LoRA), and Stella-En-400M-V5 models. K-means clustering was applied to these embeddings. Classical comparisons included K-Medoids clustering on UMAP and FAMD-reduced mixed data. Silhouette scores and statistical tests evaluated the quality and distinctiveness of the cluster. Stella-En-400M-V5 achieved the highest Silhouette Score (0.86). LLAMA 3.1 8B with the clustering objective performed better with a higher number of clusters, identifying subgroups with distinct nutritional, clinical, and socioeconomic profiles. LLM-based methods outperformed classical techniques by capturing richer context and prioritizing key features. These results highlight the potential of LLMs for contextual phenotyping and informed decision making in resource-limited settings.

Duke Scholars

Author Rishi Kamaleswaran Trauma, Acute, and Critical Care Surgery

Published In

AMIA Annu Symp Proc

EISSN

1942-597X

Publication Date

2024

Volume

2024

Start / End Page

929 / 938

Location

United States

Related Subject Headings

Sepsis
Phenotype
Natural Language Processing
Large Language Models
Humans
Cohort Studies
Cluster Analysis
Child

Citation

APA

Chicago

ICMJE

MLA

NLM

Nagori, A., Gautam, A., Wiens, M. O., Nguyen, V., Mugisha, N. K., Kabakyenga, J., … Kamaleswaran, R. (2024). Contextual Phenotyping of Pediatric Sepsis Cohort Using Large Language Models. AMIA Annu Symp Proc, 2024, 929–938.

Nagori, Aditya, Ayush Gautam, Matthew O. Wiens, Vuong Nguyen, Nathan Kenya Mugisha, Jerome Kabakyenga, Niranjan Kissoon, John Mark Ansermino, and Rishikesan Kamaleswaran. “Contextual Phenotyping of Pediatric Sepsis Cohort Using Large Language Models.” AMIA Annu Symp Proc 2024 (2024): 929–38.

Nagori A, Gautam A, Wiens MO, Nguyen V, Mugisha NK, Kabakyenga J, et al. Contextual Phenotyping of Pediatric Sepsis Cohort Using Large Language Models. AMIA Annu Symp Proc. 2024;2024:929–38.

Nagori, Aditya, et al. “Contextual Phenotyping of Pediatric Sepsis Cohort Using Large Language Models.” AMIA Annu Symp Proc, vol. 2024, 2024, pp. 929–38.

Nagori A, Gautam A, Wiens MO, Nguyen V, Mugisha NK, Kabakyenga J, Kissoon N, Ansermino JM, Kamaleswaran R. Contextual Phenotyping of Pediatric Sepsis Cohort Using Large Language Models. AMIA Annu Symp Proc. 2024;2024:929–938.

Published In

AMIA Annu Symp Proc

EISSN

1942-597X

Publication Date

2024

Volume

2024

Start / End Page

929 / 938

Location

United States

Related Subject Headings

Sepsis
Phenotype
Natural Language Processing
Large Language Models
Humans
Cohort Studies
Cluster Analysis
Child