Skip to main content

Efficient information extraction over evolving text data

Publication ,  Journal Article
Chen, F; Doan, A; Yang, J; Ramakrishnan, R
Published in: Proceedings - International Conference on Data Engineering
October 1, 2008

Most current information extraction (IE) approaches have considered only static text corpora, over which we typically have to apply IE only once. Many real-world text corpora however are dynamic. They evolve over time, and to keep extracted information up to date, we often must apply IE repeatedly, to consecutive corpus snapshots. We describe Cyclex, an approach that efficiently executes such repeated IE, by recycling previous IE efforts. Specifically, given a current corpus snapshot U, Cyclex identifies text portions of U that also appear in the previous corpus snapshot V. Since Cyclex has already executed IE over V, it can now recycle the IE results of these parts, by combining these results with the results of executing IE over the remaining parts of U, to produce the complete IE results for U. Realizing Cyclex raises many challenges, including modeling information extractors, exploring the trade-off between runtime and completeness in identifying overlapping text, and making informed, cost-based decisions between redoing IE from scratch and recycling previous IE results. We describe initial solutions to these challenges, and experiments over two realworld data sets that demonstrate the utility of our approach. © 2008 IEEE.

Duke Scholars

Published In

Proceedings - International Conference on Data Engineering

DOI

ISSN

1084-4627

Publication Date

October 1, 2008

Start / End Page

943 / 952
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Chen, F., Doan, A., Yang, J., & Ramakrishnan, R. (2008). Efficient information extraction over evolving text data. Proceedings - International Conference on Data Engineering, 943–952. https://doi.org/10.1109/ICDE.2008.4497503
Chen, F., A. Doan, J. Yang, and R. Ramakrishnan. “Efficient information extraction over evolving text data.” Proceedings - International Conference on Data Engineering, October 1, 2008, 943–52. https://doi.org/10.1109/ICDE.2008.4497503.
Chen F, Doan A, Yang J, Ramakrishnan R. Efficient information extraction over evolving text data. Proceedings - International Conference on Data Engineering. 2008 Oct 1;943–52.
Chen, F., et al. “Efficient information extraction over evolving text data.” Proceedings - International Conference on Data Engineering, Oct. 2008, pp. 943–52. Scopus, doi:10.1109/ICDE.2008.4497503.
Chen F, Doan A, Yang J, Ramakrishnan R. Efficient information extraction over evolving text data. Proceedings - International Conference on Data Engineering. 2008 Oct 1;943–952.

Published In

Proceedings - International Conference on Data Engineering

DOI

ISSN

1084-4627

Publication Date

October 1, 2008

Start / End Page

943 / 952