Skip to main content

Weighted proximity best-joins for information retrieval

Publication ,  Journal Article
Thonangi, R; He, H; Doan, AH; Wang, H; Yang, J
Published in: Proceedings - International Conference on Data Engineering
July 8, 2009

We consider the problem of efficiently computing weighted proximity best-joins over multiple lists, with applications in information retrieval and extraction. We are given a multi-term query, and for each query term, a list of all its matches with scores, sorted by locations. The problem is to find the overall best matchset, consisting of one match from each list, such that the combined score according to a scoring function is maximized. We study three types of functions that consider both individual match scores and proximity of match locations in scoring a matchset. We present algorithms that exploit the properties of the scoring functions in order to achieve time complexities linear in the size of the match lists. Experiments show that these algorithms greatly outperform the naive algorithm based on taking the cross product of all match lists. Finally, we extend our algorithms for an alternative problem definition applicable to information extraction, where we need to find all good matchsets in a document. © 2009 IEEE.

Duke Scholars

Published In

Proceedings - International Conference on Data Engineering

DOI

ISSN

1084-4627

Publication Date

July 8, 2009

Start / End Page

234 / 245
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Thonangi, R., He, H., Doan, A. H., Wang, H., & Yang, J. (2009). Weighted proximity best-joins for information retrieval. Proceedings - International Conference on Data Engineering, 234–245. https://doi.org/10.1109/ICDE.2009.61
Thonangi, R., H. He, A. H. Doan, H. Wang, and J. Yang. “Weighted proximity best-joins for information retrieval.” Proceedings - International Conference on Data Engineering, July 8, 2009, 234–45. https://doi.org/10.1109/ICDE.2009.61.
Thonangi R, He H, Doan AH, Wang H, Yang J. Weighted proximity best-joins for information retrieval. Proceedings - International Conference on Data Engineering. 2009 Jul 8;234–45.
Thonangi, R., et al. “Weighted proximity best-joins for information retrieval.” Proceedings - International Conference on Data Engineering, July 2009, pp. 234–45. Scopus, doi:10.1109/ICDE.2009.61.
Thonangi R, He H, Doan AH, Wang H, Yang J. Weighted proximity best-joins for information retrieval. Proceedings - International Conference on Data Engineering. 2009 Jul 8;234–245.

Published In

Proceedings - International Conference on Data Engineering

DOI

ISSN

1084-4627

Publication Date

July 8, 2009

Start / End Page

234 / 245