Skip to main content
release_alert
Welcome to the new Scholars 3.0! Read about new features and let us know what you think.
cancel

Multi-ancestry genome- and phenome-wide association studies of diverticular disease in electronic health records with natural language processing enriched phenotyping algorithm.

Publication ,  Journal Article
Joo, YY; Pacheco, JA; Thompson, WK; Rasmussen-Torvik, LJ; Rasmussen, LV; Lin, FTJ; Andrade, MD; Borthwick, KM; Bottinger, E; Cagan, A; Shang, N ...
Published in: Plos One
2023

OBJECTIVE: Diverticular disease (DD) is one of the most prevalent conditions encountered by gastroenterologists, affecting ~50% of Americans before the age of 60. Our aim was to identify genetic risk variants and clinical phenotypes associated with DD, leveraging multiple electronic health record (EHR) data sources of 91,166 multi-ancestry participants with a Natural Language Processing (NLP) technique. MATERIALS AND METHODS: We developed a NLP-enriched phenotyping algorithm that incorporated colonoscopy or abdominal imaging reports to identify patients with diverticulosis and diverticulitis from multicenter EHRs. We performed genome-wide association studies (GWAS) of DD in European, African and multi-ancestry participants, followed by phenome-wide association studies (PheWAS) of the risk variants to identify their potential comorbid/pleiotropic effects in clinical phenotypes. RESULTS: Our developed algorithm showed a significant improvement in patient classification performance for DD analysis (algorithm PPVs ≥ 0.94), with up to a 3.5 fold increase in terms of the number of identified patients than the traditional method. Ancestry-stratified analyses of diverticulosis and diverticulitis of the identified subjects replicated the well-established associations between ARHGAP15 loci with DD, showing overall intensified GWAS signals in diverticulitis patients compared to diverticulosis patients. Our PheWAS analyses identified significant associations between the DD GWAS variants and circulatory system, genitourinary, and neoplastic EHR phenotypes. DISCUSSION: As the first multi-ancestry GWAS-PheWAS study, we showcased that heterogenous EHR data can be mapped through an integrative analytical pipeline and reveal significant genotype-phenotype associations with clinical interpretation. CONCLUSION: A systematic framework to process unstructured EHR data with NLP could advance a deep and scalable phenotyping for better patient identification and facilitate etiological investigation of a disease with multilayered data.

Duke Scholars

Altmetric Attention Stats
Dimensions Citation Stats

Published In

Plos One

DOI

EISSN

1932-6203

Publication Date

2023

Volume

18

Issue

5

Start / End Page

e0283553

Location

United States

Related Subject Headings

  • Polymorphism, Single Nucleotide
  • Phenotype
  • Natural Language Processing
  • Humans
  • Genome-Wide Association Study
  • General Science & Technology
  • Electronic Health Records
  • Diverticulum
  • Diverticulitis
  • Diverticular Diseases
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Joo, Y. Y., Pacheco, J. A., Thompson, W. K., Rasmussen-Torvik, L. J., Rasmussen, L. V., Lin, F. T. J., … Kho, A. N. (2023). Multi-ancestry genome- and phenome-wide association studies of diverticular disease in electronic health records with natural language processing enriched phenotyping algorithm. Plos One, 18(5), e0283553. https://doi.org/10.1371/journal.pone.0283553
Joo, Yoonjung Yoonie, Jennifer A. Pacheco, William K. Thompson, Laura J. Rasmussen-Torvik, Luke V. Rasmussen, Frederick T. J. Lin, Mariza de Andrade, et al. “Multi-ancestry genome- and phenome-wide association studies of diverticular disease in electronic health records with natural language processing enriched phenotyping algorithm.Plos One 18, no. 5 (2023): e0283553. https://doi.org/10.1371/journal.pone.0283553.
Joo, Yoonjung Yoonie, et al. “Multi-ancestry genome- and phenome-wide association studies of diverticular disease in electronic health records with natural language processing enriched phenotyping algorithm.Plos One, vol. 18, no. 5, 2023, p. e0283553. Pubmed, doi:10.1371/journal.pone.0283553.
Joo YY, Pacheco JA, Thompson WK, Rasmussen-Torvik LJ, Rasmussen LV, Lin FTJ, Andrade MD, Borthwick KM, Bottinger E, Cagan A, Carrell DS, Denny JC, Ellis SB, Gottesman O, Linneman JG, Pathak J, Peissig PL, Shang N, Tromp G, Veerappan A, Smith ME, Chisholm RL, Gawron AJ, Hayes MG, Kho AN. Multi-ancestry genome- and phenome-wide association studies of diverticular disease in electronic health records with natural language processing enriched phenotyping algorithm. Plos One. 2023;18(5):e0283553.

Published In

Plos One

DOI

EISSN

1932-6203

Publication Date

2023

Volume

18

Issue

5

Start / End Page

e0283553

Location

United States

Related Subject Headings

  • Polymorphism, Single Nucleotide
  • Phenotype
  • Natural Language Processing
  • Humans
  • Genome-Wide Association Study
  • General Science & Technology
  • Electronic Health Records
  • Diverticulum
  • Diverticulitis
  • Diverticular Diseases