Skip to main content

RECALL: Membership Inference via Relative Conditional Log-Likelihoods

Publication ,  Conference
Xie, R; Wang, J; Huang, R; Zhang, M; Ge, R; Pei, J; Gong, NZ; Dhingra, B
Published in: EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference
January 1, 2024

The rapid scaling of large language models (LLMs) has raised concerns about the transparency and fair use of the data used in their pretraining. Detecting such content is challenging due to the scale of the data and limited exposure of each instance during training. We propose RECALL, (Relative Conditional Log-Likelihood), a novel membership inference attack (MIA) to detect LLMs' pretraining data by leveraging their conditional language modeling capabilities. RECALL examines the relative change in conditional log-likelihoods when prefixing target data points with non-member context. Our empirical findings show that conditioning member data on non-member prefixes induces a larger decrease in log-likelihood compared to non-member data. We conduct comprehensive experiments and show that RECALL achieves state-of-the-art performance on WikiMIA dataset, even with random and synthetic prefixes, and can be further improved using an ensemble approach. Moreover, we conduct an in-depth analysis of LLMs' behavior with different membership contexts, providing insights into how LLMs leverage membership information for effective inference at both the sequence and token level.

Duke Scholars

Published In

EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference

DOI

Publication Date

January 1, 2024

Start / End Page

8671 / 8689
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Xie, R., Wang, J., Huang, R., Zhang, M., Ge, R., Pei, J., … Dhingra, B. (2024). RECALL: Membership Inference via Relative Conditional Log-Likelihoods. In EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference (pp. 8671–8689). https://doi.org/10.18653/v1/2024.emnlp-main.493
Xie, R., J. Wang, R. Huang, M. Zhang, R. Ge, J. Pei, N. Z. Gong, and B. Dhingra. “RECALL: Membership Inference via Relative Conditional Log-Likelihoods.” In EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, 8671–89, 2024. https://doi.org/10.18653/v1/2024.emnlp-main.493.
Xie R, Wang J, Huang R, Zhang M, Ge R, Pei J, et al. RECALL: Membership Inference via Relative Conditional Log-Likelihoods. In: EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference. 2024. p. 8671–89.
Xie, R., et al. “RECALL: Membership Inference via Relative Conditional Log-Likelihoods.” EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, 2024, pp. 8671–89. Scopus, doi:10.18653/v1/2024.emnlp-main.493.
Xie R, Wang J, Huang R, Zhang M, Ge R, Pei J, Gong NZ, Dhingra B. RECALL: Membership Inference via Relative Conditional Log-Likelihoods. EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference. 2024. p. 8671–8689.

Published In

EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference

DOI

Publication Date

January 1, 2024

Start / End Page

8671 / 8689