Skip to main content

(Almost) All of Entity Resolution

Publication ,  Journal Article
Binette, O; Steorts, RC
August 10, 2020

Whether the goal is to estimate the number of people that live in a congressional district, to estimate the number of individuals that have died in an armed conflict, or to disambiguate individual authors using bibliographic data, all these applications have a common theme - integrating information from multiple sources. Before such questions can be answered, databases must be cleaned and integrated in a systematic and accurate way, commonly known as record linkage, de-duplication, or entity resolution. In this article, we review motivational applications and seminal papers that have led to the growth of this area. Specifically, we review the foundational work that began in the 1940's and 50's that have led to modern probabilistic record linkage. We review clustering approaches to entity resolution, semi- and fully supervised methods, and canonicalization, which are being used throughout industry and academia in applications such as human rights, official statistics, medicine, citation networks, among others. Finally, we discuss current research topics of practical importance.

Duke Scholars

Publication Date

August 10, 2020
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Binette, O., & Steorts, R. C. (2020). (Almost) All of Entity Resolution.
Binette, Olivier, and Rebecca C. Steorts. “(Almost) All of Entity Resolution,” August 10, 2020.
Binette O, Steorts RC. (Almost) All of Entity Resolution. 2020 Aug 10;
Binette, Olivier, and Rebecca C. Steorts. (Almost) All of Entity Resolution. Aug. 2020.
Binette O, Steorts RC. (Almost) All of Entity Resolution. 2020 Aug 10;

Publication Date

August 10, 2020