Skip to main content

Autoblock: A hands-off blocking framework for entity matching

Publication ,  Conference
Zhang, W; Wei, H; Sisman, B; Dong, XL; Faloutsos, C; Page, D
Published in: WSDM 2020 - Proceedings of the 13th International Conference on Web Search and Data Mining
January 20, 2020

Entity matching seeks to identify data records over one or multiple data sources that refer to the same real-world entity. Virtually every entity matching task on large datasets requires blocking, a step that reduces the number of record pairs to be matched. However, most of the traditional blocking methods are learning-free and key-based, and their successes are largely built on laborious human eort in cleaning data and designing blocking keys. In this paper, we propose AutoBlock, a novel hands-o blocking framework for entity matching, based on similarity-preserving representation learning and nearest neighbor search. Our contributions include: (a) Automation: AutoBlock frees users from laborious data cleaning and blocking key tuning. (b) Scalability: AutoBlock has a sub-quadratic total time complexity and can be easily deployed for millions of records. (c) Effectiveness: AutoBlock outperforms a wide range of competitive baselines on multiple large-scale, real-world datasets, especially when datasets are dirty and/or unstructured.

Duke Scholars

Altmetric Attention Stats
Dimensions Citation Stats

Published In

WSDM 2020 - Proceedings of the 13th International Conference on Web Search and Data Mining

DOI

ISBN

9781450368223

Publication Date

January 20, 2020

Start / End Page

744 / 752
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Zhang, W., Wei, H., Sisman, B., Dong, X. L., Faloutsos, C., & Page, D. (2020). Autoblock: A hands-off blocking framework for entity matching. In WSDM 2020 - Proceedings of the 13th International Conference on Web Search and Data Mining (pp. 744–752). https://doi.org/10.1145/3336191.3371813
Zhang, W., H. Wei, B. Sisman, X. L. Dong, C. Faloutsos, and D. Page. “Autoblock: A hands-off blocking framework for entity matching.” In WSDM 2020 - Proceedings of the 13th International Conference on Web Search and Data Mining, 744–52, 2020. https://doi.org/10.1145/3336191.3371813.
Zhang W, Wei H, Sisman B, Dong XL, Faloutsos C, Page D. Autoblock: A hands-off blocking framework for entity matching. In: WSDM 2020 - Proceedings of the 13th International Conference on Web Search and Data Mining. 2020. p. 744–52.
Zhang, W., et al. “Autoblock: A hands-off blocking framework for entity matching.” WSDM 2020 - Proceedings of the 13th International Conference on Web Search and Data Mining, 2020, pp. 744–52. Scopus, doi:10.1145/3336191.3371813.
Zhang W, Wei H, Sisman B, Dong XL, Faloutsos C, Page D. Autoblock: A hands-off blocking framework for entity matching. WSDM 2020 - Proceedings of the 13th International Conference on Web Search and Data Mining. 2020. p. 744–752.

Published In

WSDM 2020 - Proceedings of the 13th International Conference on Web Search and Data Mining

DOI

ISBN

9781450368223

Publication Date

January 20, 2020

Start / End Page

744 / 752