Skip to main content
Journal cover image

Text data extraction for a prospective, research-focused data mart: implementation and validation.

Publication ,  Journal Article
Hinchcliff, M; Just, E; Podlusky, S; Varga, J; Chang, RW; Kibbe, WA
Published in: BMC Med Inform Decis Mak
September 13, 2012

BACKGROUND: Translational research typically requires data abstracted from medical records as well as data collected specifically for research. Unfortunately, many data within electronic health records are represented as text that is not amenable to aggregation for analyses. We present a scalable open source SQL Server Integration Services package, called Regextractor, for including regular expression parsers into a classic extract, transform, and load workflow. We have used Regextractor to abstract discrete data from textual reports from a number of 'machine generated' sources. To validate this package, we created a pulmonary function test data mart and analyzed the quality of the data mart versus manual chart review. METHODS: Eleven variables from pulmonary function tests performed closest to the initial clinical evaluation date were studied for 100 randomly selected subjects with scleroderma. One research assistant manually reviewed, abstracted, and entered relevant data into a database. Correlation with data obtained from the automated pulmonary function test data mart within the Northwestern Medical Enterprise Data Warehouse was determined. RESULTS: There was a near perfect (99.5%) agreement between results generated from the Regextractor package and those obtained via manual chart abstraction. The pulmonary function test data mart has been used subsequently to monitor disease progression of patients in the Northwestern Scleroderma Registry. In addition to the pulmonary function test example presented in this manuscript, the Regextractor package has been used to create cardiac catheterization and echocardiography data marts. The Regextractor package was released as open source software in October 2009 and has been downloaded 552 times as of 6/1/2012. CONCLUSIONS: Collaboration between clinical researchers and biomedical informatics experts enabled the development and validation of a tool (Regextractor) to parse, abstract and assemble structured data from text data contained in the electronic health record. Regextractor has been successfully used to create additional data marts in other medical domains and is available to the public.

Duke Scholars

Altmetric Attention Stats
Dimensions Citation Stats

Published In

BMC Med Inform Decis Mak

DOI

EISSN

1472-6947

Publication Date

September 13, 2012

Volume

12

Start / End Page

106

Location

England

Related Subject Headings

  • United States
  • Translational Research, Biomedical
  • Software
  • Sclerosis
  • Scleroderma, Systemic
  • Respiratory Function Tests
  • Medical Informatics
  • Medical Informatics
  • Humans
  • Electronic Health Records
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Hinchcliff, M., Just, E., Podlusky, S., Varga, J., Chang, R. W., & Kibbe, W. A. (2012). Text data extraction for a prospective, research-focused data mart: implementation and validation. BMC Med Inform Decis Mak, 12, 106. https://doi.org/10.1186/1472-6947-12-106
Hinchcliff, Monique, Eric Just, Sofia Podlusky, John Varga, Rowland W. Chang, and Warren A. Kibbe. “Text data extraction for a prospective, research-focused data mart: implementation and validation.BMC Med Inform Decis Mak 12 (September 13, 2012): 106. https://doi.org/10.1186/1472-6947-12-106.
Hinchcliff M, Just E, Podlusky S, Varga J, Chang RW, Kibbe WA. Text data extraction for a prospective, research-focused data mart: implementation and validation. BMC Med Inform Decis Mak. 2012 Sep 13;12:106.
Hinchcliff, Monique, et al. “Text data extraction for a prospective, research-focused data mart: implementation and validation.BMC Med Inform Decis Mak, vol. 12, Sept. 2012, p. 106. Pubmed, doi:10.1186/1472-6947-12-106.
Hinchcliff M, Just E, Podlusky S, Varga J, Chang RW, Kibbe WA. Text data extraction for a prospective, research-focused data mart: implementation and validation. BMC Med Inform Decis Mak. 2012 Sep 13;12:106.
Journal cover image

Published In

BMC Med Inform Decis Mak

DOI

EISSN

1472-6947

Publication Date

September 13, 2012

Volume

12

Start / End Page

106

Location

England

Related Subject Headings

  • United States
  • Translational Research, Biomedical
  • Software
  • Sclerosis
  • Scleroderma, Systemic
  • Respiratory Function Tests
  • Medical Informatics
  • Medical Informatics
  • Humans
  • Electronic Health Records