Skip to main content
Journal cover image

PheProb: probabilistic phenotyping using diagnosis codes to improve power for genetic association studies.

Publication ,  Journal Article
Sinnott, JA; Cai, F; Yu, S; Hejblum, BP; Hong, C; Kohane, IS; Liao, KP
Published in: J Am Med Inform Assoc
October 1, 2018

OBJECTIVE: Standard approaches for large scale phenotypic screens using electronic health record (EHR) data apply thresholds, such as ≥2 diagnosis codes, to define subjects as having a phenotype. However, the variation in the accuracy of diagnosis codes can impair the power of such screens. Our objective was to develop and evaluate an approach which converts diagnosis codes into a probability of a phenotype (PheProb). We hypothesized that this alternate approach for defining phenotypes would improve power for genetic association studies. METHODS: The PheProb approach employs unsupervised clustering to separate patients into 2 groups based on diagnosis codes. Subjects are assigned a probability of having the phenotype based on the number of diagnosis codes. This approach was developed using simulated EHR data and tested in a real world EHR cohort. In the latter, we tested the association between low density lipoprotein cholesterol (LDL-C) genetic risk alleles known for association with hyperlipidemia and hyperlipidemia codes (ICD-9 272.x). PheProb and thresholding approaches were compared. RESULTS: Among n = 1462 subjects in the real world EHR cohort, the threshold-based p-values for association between the genetic risk score (GRS) and hyperlipidemia were 0.126 (≥1 code), 0.123 (≥2 codes), and 0.142 (≥3 codes). The PheProb approach produced the expected significant association between the GRS and hyperlipidemia: p = .001. CONCLUSIONS: PheProb improves statistical power for association studies relative to standard thresholding approaches by leveraging information about the phenotype in the billing code counts. The PheProb approach has direct applications where efficient approaches are required, such as in Phenome-Wide Association Studies.

Duke Scholars

Altmetric Attention Stats
Dimensions Citation Stats

Published In

J Am Med Inform Assoc

DOI

EISSN

1527-974X

Publication Date

October 1, 2018

Volume

25

Issue

10

Start / End Page

1359 / 1365

Location

England

Related Subject Headings

  • Risk
  • Probability
  • Polymorphism, Single Nucleotide
  • Phenotype
  • Medical Informatics
  • International Classification of Diseases
  • Hyperlipidemias
  • Humans
  • Genetic Testing
  • Genetic Association Studies
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Sinnott, J. A., Cai, F., Yu, S., Hejblum, B. P., Hong, C., Kohane, I. S., & Liao, K. P. (2018). PheProb: probabilistic phenotyping using diagnosis codes to improve power for genetic association studies. J Am Med Inform Assoc, 25(10), 1359–1365. https://doi.org/10.1093/jamia/ocy056
Sinnott, Jennifer A., Fiona Cai, Sheng Yu, Boris P. Hejblum, Chuan Hong, Isaac S. Kohane, and Katherine P. Liao. “PheProb: probabilistic phenotyping using diagnosis codes to improve power for genetic association studies.J Am Med Inform Assoc 25, no. 10 (October 1, 2018): 1359–65. https://doi.org/10.1093/jamia/ocy056.
Sinnott JA, Cai F, Yu S, Hejblum BP, Hong C, Kohane IS, et al. PheProb: probabilistic phenotyping using diagnosis codes to improve power for genetic association studies. J Am Med Inform Assoc. 2018 Oct 1;25(10):1359–65.
Sinnott, Jennifer A., et al. “PheProb: probabilistic phenotyping using diagnosis codes to improve power for genetic association studies.J Am Med Inform Assoc, vol. 25, no. 10, Oct. 2018, pp. 1359–65. Pubmed, doi:10.1093/jamia/ocy056.
Sinnott JA, Cai F, Yu S, Hejblum BP, Hong C, Kohane IS, Liao KP. PheProb: probabilistic phenotyping using diagnosis codes to improve power for genetic association studies. J Am Med Inform Assoc. 2018 Oct 1;25(10):1359–1365.
Journal cover image

Published In

J Am Med Inform Assoc

DOI

EISSN

1527-974X

Publication Date

October 1, 2018

Volume

25

Issue

10

Start / End Page

1359 / 1365

Location

England

Related Subject Headings

  • Risk
  • Probability
  • Polymorphism, Single Nucleotide
  • Phenotype
  • Medical Informatics
  • International Classification of Diseases
  • Hyperlipidemias
  • Humans
  • Genetic Testing
  • Genetic Association Studies