Skip to main content

Explainable machine learning aggregates polygenic risk scores and electronic health records for Alzheimer's disease prediction.

Publication ,  Journal Article
Gao, XR; Chiariglione, M; Qin, K; Nuytemans, K; Scharre, DW; Li, Y-J; Martin, ER
Published in: Sci Rep
January 9, 2023

Alzheimer's disease (AD) is the most common late-onset neurodegenerative disorder. Identifying individuals at increased risk of developing AD is important for early intervention. Using data from the Alzheimer Disease Genetics Consortium, we constructed polygenic risk scores (PRSs) for AD and age-at-onset (AAO) of AD for the UK Biobank participants. We then built machine learning (ML) models for predicting development of AD, and explored feature importance among PRSs, conventional risk factors, and ICD-10 codes from electronic health records, a total of > 11,000 features using the UK Biobank dataset. We used eXtreme Gradient Boosting (XGBoost) and SHapley Additive exPlanations (SHAP), which provided superior ML performance as well as aided ML model explanation. For participants age 40 and older, the area under the curve for AD was 0.88. For subjects of age 65 and older (late-onset AD), PRSs were the most important predictors. This is the first observation that PRSs constructed from the AD risk and AAO play more important roles than age in predicting AD. The ML model also identified important predictors from EHR, including urinary tract infection, syncope and collapse, chest pain, disorientation and hypercholesterolemia, for developing AD. Our ML model improved the accuracy of AD risk prediction by efficiently exploring numerous predictors and identified novel feature patterns.

Duke Scholars

Altmetric Attention Stats
Dimensions Citation Stats

Published In

Sci Rep

DOI

EISSN

2045-2322

Publication Date

January 9, 2023

Volume

13

Issue

1

Start / End Page

450

Location

England

Related Subject Headings

  • Risk Factors
  • Machine Learning
  • Humans
  • Electronic Health Records
  • Alzheimer Disease
  • Aged
  • Adult
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Gao, X. R., Chiariglione, M., Qin, K., Nuytemans, K., Scharre, D. W., Li, Y.-J., & Martin, E. R. (2023). Explainable machine learning aggregates polygenic risk scores and electronic health records for Alzheimer's disease prediction. Sci Rep, 13(1), 450. https://doi.org/10.1038/s41598-023-27551-1
Gao, Xiaoyi Raymond, Marion Chiariglione, Ke Qin, Karen Nuytemans, Douglas W. Scharre, Yi-Ju Li, and Eden R. Martin. “Explainable machine learning aggregates polygenic risk scores and electronic health records for Alzheimer's disease prediction.Sci Rep 13, no. 1 (January 9, 2023): 450. https://doi.org/10.1038/s41598-023-27551-1.
Gao XR, Chiariglione M, Qin K, Nuytemans K, Scharre DW, Li Y-J, et al. Explainable machine learning aggregates polygenic risk scores and electronic health records for Alzheimer's disease prediction. Sci Rep. 2023 Jan 9;13(1):450.
Gao, Xiaoyi Raymond, et al. “Explainable machine learning aggregates polygenic risk scores and electronic health records for Alzheimer's disease prediction.Sci Rep, vol. 13, no. 1, Jan. 2023, p. 450. Pubmed, doi:10.1038/s41598-023-27551-1.
Gao XR, Chiariglione M, Qin K, Nuytemans K, Scharre DW, Li Y-J, Martin ER. Explainable machine learning aggregates polygenic risk scores and electronic health records for Alzheimer's disease prediction. Sci Rep. 2023 Jan 9;13(1):450.

Published In

Sci Rep

DOI

EISSN

2045-2322

Publication Date

January 9, 2023

Volume

13

Issue

1

Start / End Page

450

Location

England

Related Subject Headings

  • Risk Factors
  • Machine Learning
  • Humans
  • Electronic Health Records
  • Alzheimer Disease
  • Aged
  • Adult