Evaluation of emergency medical text processor, a system for cleaning chief complaint text data.

Journal Article (Journal Article)


Emergency Medical Text Processor (EMT-P) version 1, a natural language processing system that cleans emergency department text (e.g., chst pn, chest pai), was developed to maximize extraction of standard terms (e.g., chest pain). The authors compared the number of standard terms extracted from raw chief complaint (CC) data with that for CC data cleaned with EMT-P and evaluated the accuracy of EMT-P.


This cross-sectional observation study included CC text entries for all emergency department visits to three tertiary care centers in 2001. Terms were extracted from CC entries before and after cleaning with EMT-P. Descriptive statistics included number and percentage of all entries (tokens) and all unique entries (types) that matched a standard term from the Unified Medical Language System (UMLS). An expert panel rated the accuracy of the CC-UMLS term matches; inter-rater reliability was measured with kappa.


The authors collected 203,509 CC entry tokens, of which 63,946 were unique entry types. For the raw data, 89,337 tokens (44%) and 5,081 types (8%) matched a standard term. After EMT-P cleaning, 168,050 tokens (83%) and 44,430 types (69%) matched a standard term. The expert panel reached consensus on 201 of the 222 CC-UMLS term matches reviewed (kappa=0.69-0.72). Ninety-six percent of the 201 matches were rated equivalent or related. Thirty-eight percent of the nonmatches were found to match UMLS concepts.


EMT-P version 1 is relatively accurate, and cleaning with EMT-P improved the CC-UMLS term match rate over raw data. The authors identified areas for improvement in future EMT-P versions and issues to be resolved in developing a standard CC terminology.

Full Text

Duke Authors

Cited Authors

  • Travers, DA; Haas, SW

Published Date

  • November 2004

Published In

Volume / Issue

  • 11 / 11

Start / End Page

  • 1170 - 1176

PubMed ID

  • 15528581

Electronic International Standard Serial Number (EISSN)

  • 1553-2712

International Standard Serial Number (ISSN)

  • 1069-6563

Digital Object Identifier (DOI)

  • 10.1197/j.aem.2004.08.012


  • eng