Skip to main content

Information Extraction for Clinical Data Mining: A Mammography Case Study.

Publication ,  Conference
Nassif, H; Woods, R; Burnside, E; Ayvaci, M; Shavlik, J; Page, D
Published in: Proc IEEE Int Conf Data Min
2009

Breast cancer is the leading cause of cancer mortality in women between the ages of 15 and 54. During mammography screening, radiologists use a strict lexicon (BI-RADS) to describe and report their findings. Mammography records are then stored in a well-defined database format (NMD). Lately, researchers have applied data mining and machine learning techniques to these databases. They successfully built breast cancer classifiers that can help in early detection of malignancy. However, the validity of these models depends on the quality of the underlying databases. Unfortunately, most databases suffer from inconsistencies, missing data, inter-observer variability and inappropriate term usage. In addition, many databases are not compliant with the NMD format and/or solely consist of text reports. BI-RADS feature extraction from free text and consistency checks between recorded predictive variables and text reports are crucial to addressing this problem. We describe a general scheme for concept information retrieval from free text given a lexicon, and present a BI-RADS features extraction algorithm for clinical data mining. It consists of a syntax analyzer, a concept finder and a negation detector. The syntax analyzer preprocesses the input into individual sentences. The concept finder uses a semantic grammar based on the BI-RADS lexicon and the experts' input. It parses sentences detecting BI-RADS concepts. Once a concept is located, a lexical scanner checks for negation. Our method can handle multiple latent concepts within the text, filtering out ultrasound concepts. On our dataset, our algorithm achieves 97.7% precision, 95.5% recall and an F1-score of 0.97. It outperforms manual feature extraction at the 5% statistical significance level.

Duke Scholars

Altmetric Attention Stats
Dimensions Citation Stats

Published In

Proc IEEE Int Conf Data Min

DOI

ISSN

1550-4786

Publication Date

2009

Start / End Page

37 / 42

Location

United States
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Nassif, H., Woods, R., Burnside, E., Ayvaci, M., Shavlik, J., & Page, D. (2009). Information Extraction for Clinical Data Mining: A Mammography Case Study. In Proc IEEE Int Conf Data Min (pp. 37–42). United States. https://doi.org/10.1109/icdmw.2009.63
Nassif, Houssam, Ryan Woods, Elizabeth Burnside, Mehmet Ayvaci, Jude Shavlik, and David Page. “Information Extraction for Clinical Data Mining: A Mammography Case Study.” In Proc IEEE Int Conf Data Min, 37–42, 2009. https://doi.org/10.1109/icdmw.2009.63.
Nassif H, Woods R, Burnside E, Ayvaci M, Shavlik J, Page D. Information Extraction for Clinical Data Mining: A Mammography Case Study. In: Proc IEEE Int Conf Data Min. 2009. p. 37–42.
Nassif, Houssam, et al. “Information Extraction for Clinical Data Mining: A Mammography Case Study.Proc IEEE Int Conf Data Min, 2009, pp. 37–42. Pubmed, doi:10.1109/icdmw.2009.63.
Nassif H, Woods R, Burnside E, Ayvaci M, Shavlik J, Page D. Information Extraction for Clinical Data Mining: A Mammography Case Study. Proc IEEE Int Conf Data Min. 2009. p. 37–42.

Published In

Proc IEEE Int Conf Data Min

DOI

ISSN

1550-4786

Publication Date

2009

Start / End Page

37 / 42

Location

United States