Skip to main content

Real-time Factuality Assessment from Adversarial Feedback

Publication ,  Conference
Chen, S; Huang, Y; Dhingra, B
Published in: Proceedings of the Annual Meeting of the Association for Computational Linguistics
January 1, 2025

We show that existing evaluations for assessing the factuality of news from conventional sources, such as claims on fact-checking websites, result in high accuracies over time for LLM-based detectors-even after their knowledge cutoffs. This suggests that recent popular false information from such sources can be easily identified due to its likely presence in pretraining/retrieval corpora or the emergence of salient, yet shallow, patterns in these datasets. Instead, we argue that a proper factuality evaluation dataset should test a model's ability to reason about current events by retrieving and reading related evidence. To this end, we develop a novel pipeline that leverages natural language feedback from a RAG-based detector to iteratively modify real-time news into deceptive variants that challenge LLMs. Our iterative rewrite decreases the binary classification ROC-AUC by an absolute 17.5 percent for a strong RAG-based GPT-4o detector. Our experiments reveal the important role of RAG in both evaluating and generating challenging news examples, as retrieval-free LLM detectors are vulnerable to unseen events and adversarial attacks, while feedback from RAG-based evaluation helps discover more deceitful patterns.

Duke Scholars

Published In

Proceedings of the Annual Meeting of the Association for Computational Linguistics

ISSN

0736-587X

Publication Date

January 1, 2025

Volume

1

Start / End Page

1610 / 1630
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Chen, S., Huang, Y., & Dhingra, B. (2025). Real-time Factuality Assessment from Adversarial Feedback. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 1, pp. 1610–1630).
Chen, S., Y. Huang, and B. Dhingra. “Real-time Factuality Assessment from Adversarial Feedback.” In Proceedings of the Annual Meeting of the Association for Computational Linguistics, 1:1610–30, 2025.
Chen S, Huang Y, Dhingra B. Real-time Factuality Assessment from Adversarial Feedback. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics. 2025. p. 1610–30.
Chen, S., et al. “Real-time Factuality Assessment from Adversarial Feedback.” Proceedings of the Annual Meeting of the Association for Computational Linguistics, vol. 1, 2025, pp. 1610–30.
Chen S, Huang Y, Dhingra B. Real-time Factuality Assessment from Adversarial Feedback. Proceedings of the Annual Meeting of the Association for Computational Linguistics. 2025. p. 1610–1630.

Published In

Proceedings of the Annual Meeting of the Association for Computational Linguistics

ISSN

0736-587X

Publication Date

January 1, 2025

Volume

1

Start / End Page

1610 / 1630