Skip to main content

Automating entity matching model development

Publication ,  Conference
Wang, P; Zheng, W; Wang, J; Pei, J
Published in: Proceedings - International Conference on Data Engineering
April 1, 2021

This paper seeks to answer one important but unexplored question for Entity Matching (EM): can we develop a good machine learning pipeline automatically for the EM task? If yes, to what extent the process can be automated? To answer this question, we find that a general-purpose AutoML tool cannot be directly applied to solve an EM problem, thus propose AutoML-EM, an automated model pipeline development solution tailored for EM. In reality, however, another bottleneck of EM problem is the insufficient labeled data. To mitigate this issue, active learning based solutions are widely adopted. Under this setting, we propose AutoML-EM-Active, investigating how to maximize the benefit of AutoML-EM with automatic data labeling. We provide fundamental insights into our solutions and conduct extensive experiments to examine their performance on benchmark datasets. The results suggest that AutoML-EM not only avoids human involvement in model development process but also reaches or exceeds the state-of-the-art EM performance, and AutoML-EM-Active improves the model performance under the active learning setting effectively.

Duke Scholars

Altmetric Attention Stats
Dimensions Citation Stats

Published In

Proceedings - International Conference on Data Engineering

DOI

ISSN

1084-4627

ISBN

9781728191843

Publication Date

April 1, 2021

Volume

2021-April

Start / End Page

1296 / 1307
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Wang, P., Zheng, W., Wang, J., & Pei, J. (2021). Automating entity matching model development. In Proceedings - International Conference on Data Engineering (Vol. 2021-April, pp. 1296–1307). https://doi.org/10.1109/ICDE51399.2021.00116
Wang, P., W. Zheng, J. Wang, and J. Pei. “Automating entity matching model development.” In Proceedings - International Conference on Data Engineering, 2021-April:1296–1307, 2021. https://doi.org/10.1109/ICDE51399.2021.00116.
Wang P, Zheng W, Wang J, Pei J. Automating entity matching model development. In: Proceedings - International Conference on Data Engineering. 2021. p. 1296–307.
Wang, P., et al. “Automating entity matching model development.” Proceedings - International Conference on Data Engineering, vol. 2021-April, 2021, pp. 1296–307. Scopus, doi:10.1109/ICDE51399.2021.00116.
Wang P, Zheng W, Wang J, Pei J. Automating entity matching model development. Proceedings - International Conference on Data Engineering. 2021. p. 1296–1307.

Published In

Proceedings - International Conference on Data Engineering

DOI

ISSN

1084-4627

ISBN

9781728191843

Publication Date

April 1, 2021

Volume

2021-April

Start / End Page

1296 / 1307