Scholars@Duke publication: Raccoon: Prompt Extraction Benchmark of LLM-Integrated Applications

Raccoon: Prompt Extraction Benchmark of LLM-Integrated Applications

Publication , Conference

Wang, J; Yang, T; Xie, R; Dhingra, B

Published in: Proceedings of the Annual Meeting of the Association for Computational Linguistics

January 1, 2024

With the proliferation of LLM-integrated applications such as GPT-s, millions are deployed, offering valuable services through proprietary instruction prompts. These systems, however, are prone to prompt extraction attacks through meticulously designed queries. To help mitigate this problem, we introduce the Raccoon benchmark which comprehensively evaluates a model's susceptibility to prompt extraction attacks. Our novel evaluation method assesses models under both defenseless and defended scenarios, employing a dual approach to evaluate the effectiveness of existing defenses and the resilience of the models. The benchmark encompasses 14 categories of prompt extraction attacks, with additional compounded attacks that closely mimic the strategies of potential attackers, alongside a diverse collection of defense templates. This array is, to our knowledge, the most extensive compilation of prompt theft attacks and defense mechanisms to date. Our findings highlight universal susceptibility to prompt theft in the absence of defenses, with OpenAI models demonstrating notable resilience when protected. This paper aims to establish a more systematic benchmark for assessing LLM robustness against prompt extraction attacks, offering insights into their causes and potential countermeasures.

Duke Scholars

Author Bhuwan Dhingra Computer Science

Published In

Proceedings of the Annual Meeting of the Association for Computational Linguistics

DOI

10.18653/v1/2024.findings-acl.791

ISSN

0736-587X

Publication Date

January 1, 2024

Start / End Page

13349 / 13365

Citation

APA

Chicago

ICMJE

MLA

NLM

Wang, J., Yang, T., Xie, R., & Dhingra, B. (2024). Raccoon: Prompt Extraction Benchmark of LLM-Integrated Applications. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 13349–13365). https://doi.org/10.18653/v1/2024.findings-acl.791

Wang, J., T. Yang, R. Xie, and B. Dhingra. “Raccoon: Prompt Extraction Benchmark of LLM-Integrated Applications.” In Proceedings of the Annual Meeting of the Association for Computational Linguistics, 13349–65, 2024. https://doi.org/10.18653/v1/2024.findings-acl.791.

Wang J, Yang T, Xie R, Dhingra B. Raccoon: Prompt Extraction Benchmark of LLM-Integrated Applications. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics. 2024. p. 13349–65.

Wang, J., et al. “Raccoon: Prompt Extraction Benchmark of LLM-Integrated Applications.” Proceedings of the Annual Meeting of the Association for Computational Linguistics, 2024, pp. 13349–65. Scopus, doi:10.18653/v1/2024.findings-acl.791.

Wang J, Yang T, Xie R, Dhingra B. Raccoon: Prompt Extraction Benchmark of LLM-Integrated Applications. Proceedings of the Annual Meeting of the Association for Computational Linguistics. 2024. p. 13349–13365.

Published In

Proceedings of the Annual Meeting of the Association for Computational Linguistics

DOI

10.18653/v1/2024.findings-acl.791

ISSN

0736-587X

Publication Date

January 1, 2024

Start / End Page

13349 / 13365