Scholars@Duke publication: Position: TRUSTLLM: Trustworthiness in Large Language Models

Position: TRUSTLLM: Trustworthiness in Large Language Models

Publication , Conference

Huang, Y; Sun, L; Wang, H; Wu, S; Zhang, Q; Li, Y; Gao, C; Lyu, W; Zhang, Y; Li, X; Sun, H; Liu, Z; Liu, Y; Wang, Y; Zhang, Z; Vidgen, B ...

Published in: Proceedings of Machine Learning Research

January 1, 2024

Large language models (LLMs) have gained considerable attention for their excellent natural language processing capabilities. Nonetheless, these LLMs present many challenges, particularly in the realm of trustworthiness. This paper introduces TRUSTLLM, a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions. Specifically, we first propose a set of principles for trustworthy LLMs that span eight different dimensions. Based on these principles, we further establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics. We then present a study evaluating 16 mainstream LLMs in TRUSTLLM, consisting of over 30 datasets. Our findings firstly show that in general trustworthiness and capability (i.e., functional effectiveness) are positively related. Secondly, our observations reveal that proprietary LLMs generally outperform most open-source counterparts in terms of trustworthiness, raising concerns about the potential risks of widely accessible open-source LLMs. However, a few open-source LLMs come very close to proprietary ones, suggesting that open-source models can achieve high levels of trustworthiness without additional mechanisms like moderator, offering valuable insights for developers in this field. Thirdly, it is important to note that some LLMs may be overly calibrated towards exhibiting trustworthiness, to the extent that they compromise their utility by mistakenly treating benign prompts as harmful and consequently not responding. Besides these observations, we've uncovered key insights into the multifaceted trustworthiness in LLMs. We emphasize the importance of ensuring transparency not only in the models themselves but also in the technologies that underpin trustworthiness. We advocate that the establishment of an AI alliance between industry, academia, and the open-source community to foster collaboration is imperative to advance the trustworthiness of LLMs. Our dataset, code, and toolkit will be available at § https://github.com/HowieHwong/TrustLLM and the leaderboard is released at https://trustllmbenchmark.github. io/TrustLLM-Website/.

Duke Scholars

Author Jian Pei Computer Science

Published In

Proceedings of Machine Learning Research

EISSN

2640-3498

Publication Date

January 1, 2024

Volume

235

Start / End Page

20166 / 20270

Citation

APA

Chicago

ICMJE

MLA

NLM

Huang, Y., Sun, L., Wang, H., Wu, S., Zhang, Q., Li, Y., … Zhao, Y. (2024). Position: TRUSTLLM: Trustworthiness in Large Language Models. In Proceedings of Machine Learning Research (Vol. 235, pp. 20166–20270).

Huang, Y., L. Sun, H. Wang, S. Wu, Q. Zhang, Y. Li, C. Gao, et al. “Position: TRUSTLLM: Trustworthiness in Large Language Models.” In Proceedings of Machine Learning Research, 235:20166–270, 2024.

Huang Y, Sun L, Wang H, Wu S, Zhang Q, Li Y, et al. Position: TRUSTLLM: Trustworthiness in Large Language Models. In: Proceedings of Machine Learning Research. 2024. p. 20166–270.

Huang, Y., et al. “Position: TRUSTLLM: Trustworthiness in Large Language Models.” Proceedings of Machine Learning Research, vol. 235, 2024, pp. 20166–270.

Huang Y, Sun L, Wang H, Wu S, Zhang Q, Li Y, Gao C, Lyu W, Zhang Y, Li X, Sun H, Liu Z, Liu Y, Wang Y, Zhang Z, Vidgen B, Kailkhura B, Xiong C, Xiao C, Li C, Xing E, Huang F, Liu H, Ji H, Zhang H, Yao H, Kellis M, Zitnik M, Jiang M, Bansal M, Zou J, Pei J, Liu J, Gao J, Han J, Zhao J, Tang J, Wang J, Vanschoren J, Mitchell JC, Shu K, Xu K, Chang KW, He L, Huang L, Backes M, Gong NZ, Yu PS, Chen PY, Gu Q, Xu R, Ying R, Ji S, Jana S, Chen T, Liu T, Zhou T, Wang W, Zhang X, Wang X, Xie X, Chen X, Ye Y, Cao Y, Chen Y, Zhao Y. Position: TRUSTLLM: Trustworthiness in Large Language Models. Proceedings of Machine Learning Research. 2024. p. 20166–20270.

Published In

Proceedings of Machine Learning Research

EISSN

2640-3498

Publication Date

January 1, 2024

Volume

235

Start / End Page

20166 / 20270