Scholars@Duke publication: Large-scale evaluation of automated clinical note de-identification and its impact on information extraction.

Large-scale evaluation of automated clinical note de-identification and its impact on information extraction.

Publication , Journal Article

Deleger, L; Molnar, K; Savova, G; Xia, F; Lingren, T; Li, Q; Marsolo, K; Jegga, A; Kaiser, M; Stoutenborough, L; Solti, I

Published in: J Am Med Inform Assoc

January 1, 2013

Published version (DOI) Link to item

OBJECTIVE: (1) To evaluate a state-of-the-art natural language processing (NLP)-based approach to automatically de-identify a large set of diverse clinical notes. (2) To measure the impact of de-identification on the performance of information extraction algorithms on the de-identified documents. MATERIAL AND METHODS: A cross-sectional study that included 3503 stratified, randomly selected clinical notes (over 22 note types) from five million documents produced at one of the largest US pediatric hospitals. Sensitivity, precision, F value of two automated de-identification systems for removing all 18 HIPAA-defined protected health information elements were computed. Performance was assessed against a manually generated 'gold standard'. Statistical significance was tested. The automated de-identification performance was also compared with that of two humans on a 10% subsample of the gold standard. The effect of de-identification on the performance of subsequent medication extraction was measured. RESULTS: The gold standard included 30 815 protected health information elements and more than one million tokens. The most accurate NLP method had 91.92% sensitivity (R) and 95.08% precision (P) overall. The performance of the system was indistinguishable from that of human annotators (annotators' performance was 92.15%(R)/93.95%(P) and 94.55%(R)/88.45%(P) overall while the best system obtained 92.91%(R)/95.73%(P) on same text). The impact of automated de-identification was minimal on the utility of the narrative notes for subsequent information extraction as measured by the sensitivity and precision of medication name extraction. DISCUSSION AND CONCLUSION: NLP-based de-identification shows excellent performance that rivals the performance of human annotators. Furthermore, unlike manual de-identification, the automated approach scales up to millions of documents quickly and inexpensively.

Duke Scholars

Author Keith Allen Marsolo Population Health Sciences

Published In

J Am Med Inform Assoc

DOI

10.1136/amiajnl-2012-001012

EISSN

1527-974X

Publication Date

January 1, 2013

Volume

Issue

Start / End Page

84 / 94

Location

England

Related Subject Headings

United States
Technology Assessment, Biomedical
Reproducibility of Results
Observer Variation
Natural Language Processing
Medical Informatics
Information Dissemination
Humans
Hospitals, Pediatric
Electronic Health Records

Citation

APA

Chicago

ICMJE

MLA

NLM

Deleger, L., Molnar, K., Savova, G., Xia, F., Lingren, T., Li, Q., … Solti, I. (2013). Large-scale evaluation of automated clinical note de-identification and its impact on information extraction. J Am Med Inform Assoc, 20(1), 84–94. https://doi.org/10.1136/amiajnl-2012-001012

Deleger, Louise, Katalin Molnar, Guergana Savova, Fei Xia, Todd Lingren, Qi Li, Keith Marsolo, et al. “Large-scale evaluation of automated clinical note de-identification and its impact on information extraction.” J Am Med Inform Assoc 20, no. 1 (January 1, 2013): 84–94. https://doi.org/10.1136/amiajnl-2012-001012.

Deleger L, Molnar K, Savova G, Xia F, Lingren T, Li Q, et al. Large-scale evaluation of automated clinical note de-identification and its impact on information extraction. J Am Med Inform Assoc. 2013 Jan 1;20(1):84–94.

Deleger, Louise, et al. “Large-scale evaluation of automated clinical note de-identification and its impact on information extraction.” J Am Med Inform Assoc, vol. 20, no. 1, Jan. 2013, pp. 84–94. Pubmed, doi:10.1136/amiajnl-2012-001012.

Deleger L, Molnar K, Savova G, Xia F, Lingren T, Li Q, Marsolo K, Jegga A, Kaiser M, Stoutenborough L, Solti I. Large-scale evaluation of automated clinical note de-identification and its impact on information extraction. J Am Med Inform Assoc. 2013 Jan 1;20(1):84–94.

Published In

J Am Med Inform Assoc

DOI

10.1136/amiajnl-2012-001012

EISSN

1527-974X

Publication Date

January 1, 2013

Volume

Issue

Start / End Page

84 / 94

Location

England

Related Subject Headings

United States
Technology Assessment, Biomedical
Reproducibility of Results
Observer Variation
Natural Language Processing
Medical Informatics
Information Dissemination
Humans
Hospitals, Pediatric
Electronic Health Records