Skip to main content

Selecting data to clean for fact checking: Minimizing uncertainty vs. maximizing surprise

Publication ,  Journal Article
Sintos, S; Agarwal, PK; Yang, J
Published in: Proceedings of the VLDB Endowment
January 1, 2020

We study the optimization problem of selecting numerical quantities to clean in order to fact-check claims based on such data. Oftentimes, such claims are technically correct, but they can still mislead for two reasons. First, data may contain uncertainty and errors. Second, data can be “fished“ to advance particular positions. In practice, fact-checkers cannot afford to clean all data and must choose to clean what “matters the most“ to checking a claim. We explore alternative definitions of what “matters the most“: one is to ascertain claim qualities (by minimizing uncertainty in these measures), while an alternative is just to counter the claim (by maximizing the probability of finding a counterargument). We show whether the two objectives align with each other, with important implications on when fact-checkers should exercise care in selective data cleaning, to avoid potential bias introduced by their desire to counter claims. We develop efficient algorithms for solving the various variants of the optimization problem, showing significant improvements over naive solutions. The problem is particularly challenging because the objectives in the fact-checking context are complex, non-linear functions over data. We obtain results that generalize to a large class of functions, with potential applications beyond fact-checking.

Duke Scholars

Altmetric Attention Stats
Dimensions Citation Stats

Published In

Proceedings of the VLDB Endowment

DOI

EISSN

2150-8097

Publication Date

January 1, 2020

Volume

12

Issue

13

Start / End Page

2408 / 2421

Related Subject Headings

  • 4605 Data management and data science
  • 0807 Library and Information Studies
  • 0806 Information Systems
  • 0802 Computation Theory and Mathematics
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Sintos, S., Agarwal, P. K., & Yang, J. (2020). Selecting data to clean for fact checking: Minimizing uncertainty vs. maximizing surprise. Proceedings of the VLDB Endowment, 12(13), 2408–2421. https://doi.org/10.14778/3358701.3358708
Sintos, S., P. K. Agarwal, and J. Yang. “Selecting data to clean for fact checking: Minimizing uncertainty vs. maximizing surprise.” Proceedings of the VLDB Endowment 12, no. 13 (January 1, 2020): 2408–21. https://doi.org/10.14778/3358701.3358708.
Sintos S, Agarwal PK, Yang J. Selecting data to clean for fact checking: Minimizing uncertainty vs. maximizing surprise. Proceedings of the VLDB Endowment. 2020 Jan 1;12(13):2408–21.
Sintos, S., et al. “Selecting data to clean for fact checking: Minimizing uncertainty vs. maximizing surprise.” Proceedings of the VLDB Endowment, vol. 12, no. 13, Jan. 2020, pp. 2408–21. Scopus, doi:10.14778/3358701.3358708.
Sintos S, Agarwal PK, Yang J. Selecting data to clean for fact checking: Minimizing uncertainty vs. maximizing surprise. Proceedings of the VLDB Endowment. 2020 Jan 1;12(13):2408–2421.

Published In

Proceedings of the VLDB Endowment

DOI

EISSN

2150-8097

Publication Date

January 1, 2020

Volume

12

Issue

13

Start / End Page

2408 / 2421

Related Subject Headings

  • 4605 Data management and data science
  • 0807 Library and Information Studies
  • 0806 Information Systems
  • 0802 Computation Theory and Mathematics