Skip to main content

Aggregate queries on probabilistic record linkages

Publication ,  Conference
Hua, M; Pei, J
Published in: ACM International Conference Proceeding Series
July 10, 2012

Record linkage analysis, which matches records referring to the same real world entities from different data sets, is an important task in data integration. Uncertainty often exists in record linkages due to incompleteness or ambiguity in data. Fortunately, the state-of-the-art probabilistic record linkage methods are capable of computing the probability that two records referring to the same entity. In this paper, we study the novel aggregate queries on probabilistic record linkages, such as counting the number of matched records. We address several fundamental issues. First, we advocate that the answer to an aggregate query on probabilistic record linkages is a probability distribution of possible answers derived from possible worlds. Second, we identify the category of compatible linkages only on which the answers to aggregate queries can be determined properly when the probabilities of individual linkages are available but the joint distributions of multiple linkages are unavailable. Third, we give a quadratic exact algorithm and two approximation algorithms to answer aggregate queries. © 2012 ACM.

Duke Scholars

Published In

ACM International Conference Proceeding Series

DOI

Publication Date

July 10, 2012

Start / End Page

360 / 371
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Hua, M., & Pei, J. (2012). Aggregate queries on probabilistic record linkages. In ACM International Conference Proceeding Series (pp. 360–371). https://doi.org/10.1145/2247596.2247639
Hua, M., and J. Pei. “Aggregate queries on probabilistic record linkages.” In ACM International Conference Proceeding Series, 360–71, 2012. https://doi.org/10.1145/2247596.2247639.
Hua M, Pei J. Aggregate queries on probabilistic record linkages. In: ACM International Conference Proceeding Series. 2012. p. 360–71.
Hua, M., and J. Pei. “Aggregate queries on probabilistic record linkages.” ACM International Conference Proceeding Series, 2012, pp. 360–71. Scopus, doi:10.1145/2247596.2247639.
Hua M, Pei J. Aggregate queries on probabilistic record linkages. ACM International Conference Proceeding Series. 2012. p. 360–371.

Published In

ACM International Conference Proceeding Series

DOI

Publication Date

July 10, 2012

Start / End Page

360 / 371