Scholars@Duke publication: High-throughput phenotyping with electronic medical record data using a common semi-supervised approach (PheCAP).

High-throughput phenotyping with electronic medical record data using a common semi-supervised approach (PheCAP).

Publication , Journal Article

Zhang, Y; Cai, T; Yu, S; Cho, K; Hong, C; Sun, J; Huang, J; Ho, Y-L; Ananthakrishnan, AN; Xia, Z; Shaw, SY; Gainer, V; Castro, V; Link, N ...

Published in: Nat Protoc

December 2019

Published version (DOI) Link to item

Phenotypes are the foundation for clinical and genetic studies of disease risk and outcomes. The growth of biobanks linked to electronic medical record (EMR) data has both facilitated and increased the demand for efficient, accurate, and robust approaches for phenotyping millions of patients. Challenges to phenotyping with EMR data include variation in the accuracy of codes, as well as the high level of manual input required to identify features for the algorithm and to obtain gold standard labels. To address these challenges, we developed PheCAP, a high-throughput semi-supervised phenotyping pipeline. PheCAP begins with data from the EMR, including structured data and information extracted from the narrative notes using natural language processing (NLP). The standardized steps integrate automated procedures, which reduce the level of manual input, and machine learning approaches for algorithm training. PheCAP itself can be executed in 1-2 d if all data are available; however, the timing is largely dependent on the chart review stage, which typically requires at least 2 weeks. The final products of PheCAP include a phenotype algorithm, the probability of the phenotype for all patients, and a phenotype classification (yes or no).

Duke Scholars

Author Chuan Hong Biostatistics & Bioinformatics, Division of Translational Bi ...

Altmetric Attention Stats

Dimensions Citation Stats

Published In

Nat Protoc

DOI

10.1038/s41596-019-0227-6

EISSN

1750-2799

Publication Date

December 2019

Volume

Issue

Start / End Page

3426 / 3444

Location

England

Related Subject Headings

Phenotype
Natural Language Processing
Machine Learning
Humans
High-Throughput Screening Assays
Electronic Health Records
Data Interpretation, Statistical
Data Analysis
Bioinformatics
Algorithms

Citation

APA

Chicago

ICMJE

MLA

NLM

Zhang, Y., Cai, T., Yu, S., Cho, K., Hong, C., Sun, J., … Liao, K. P. (2019). High-throughput phenotyping with electronic medical record data using a common semi-supervised approach (PheCAP). Nat Protoc, 14(12), 3426–3444. https://doi.org/10.1038/s41596-019-0227-6

Zhang, Yichi, Tianrun Cai, Sheng Yu, Kelly Cho, Chuan Hong, Jiehuan Sun, Jie Huang, et al. “High-throughput phenotyping with electronic medical record data using a common semi-supervised approach (PheCAP).” Nat Protoc 14, no. 12 (December 2019): 3426–44. https://doi.org/10.1038/s41596-019-0227-6.

Zhang Y, Cai T, Yu S, Cho K, Hong C, Sun J, et al. High-throughput phenotyping with electronic medical record data using a common semi-supervised approach (PheCAP). Nat Protoc. 2019 Dec;14(12):3426–44.

Zhang, Yichi, et al. “High-throughput phenotyping with electronic medical record data using a common semi-supervised approach (PheCAP).” Nat Protoc, vol. 14, no. 12, Dec. 2019, pp. 3426–44. Pubmed, doi:10.1038/s41596-019-0227-6.

Zhang Y, Cai T, Yu S, Cho K, Hong C, Sun J, Huang J, Ho Y-L, Ananthakrishnan AN, Xia Z, Shaw SY, Gainer V, Castro V, Link N, Honerlaw J, Huang S, Gagnon D, Karlson EW, Plenge RM, Szolovits P, Savova G, Churchill S, O’Donnell C, Murphy SN, Gaziano JM, Kohane I, Liao KP. High-throughput phenotyping with electronic medical record data using a common semi-supervised approach (PheCAP). Nat Protoc. 2019 Dec;14(12):3426–3444.

Published In

Nat Protoc

DOI

10.1038/s41596-019-0227-6

EISSN

1750-2799

Publication Date

December 2019

Volume

Issue

Start / End Page

3426 / 3444

Location

England

Related Subject Headings

Phenotype
Natural Language Processing
Machine Learning
Humans
High-Throughput Screening Assays
Electronic Health Records
Data Interpretation, Statistical
Data Analysis
Bioinformatics
Algorithms