(Almost) all of entity resolution.

Journal Article (Review;Journal Article)

Whether the goal is to estimate the number of people that live in a congressional district, to estimate the number of individuals that have died in an armed conflict, or to disambiguate individual authors using bibliographic data, all these applications have a common theme-integrating information from multiple sources. Before such questions can be answered, databases must be cleaned and integrated in a systematic and accurate way, commonly known as structured entity resolution (record linkage or deduplication). Here, we review motivational applications and seminal papers that have led to the growth of this area. We review modern probabilistic and Bayesian methods in statistics, computer science, machine learning, database management, economics, political science, and other disciplines that are used throughout industry and academia in applications such as human rights, official statistics, medicine, and citation networks, among others. Last, we discuss current research topics of practical importance.

Full Text

Duke Authors

Cited Authors

  • Binette, O; Steorts, RC

Published Date

  • March 2022

Published In

Volume / Issue

  • 8 / 12

Start / End Page

  • eabi8021 -

PubMed ID

  • 35333582

Electronic International Standard Serial Number (EISSN)

  • 2375-2548

International Standard Serial Number (ISSN)

  • 2375-2548

Digital Object Identifier (DOI)

  • 10.1126/sciadv.abi8021

Language

  • eng