Scholars@Duke publication: Detecting implicit biases of large language models with Bayesian hypothesis testing.

Detecting implicit biases of large language models with Bayesian hypothesis testing.

Publication , Journal Article

Si, S; Jiang, X; Su, Q; Carin, L

Published in: Scientific reports

April 2025

Despite the remarkable performance of large language models (LLMs), such as generative pre-trained Transformers (GPTs), across various tasks, they often perpetuate social biases and stereotypes embedded in their training data. In this paper, we introduce a novel framework that reformulates bias detection in LLMs as a hypothesis testing problem, where the null hypothesis [Formula: see text] represents the absence of implicit bias. Our framework leverages binary-choice questions to measure social bias in both open-source and proprietary LLMs accessible via APIs. We demonstrate the flexibility of our approach by integrating classical statistical methods, such as the exact binomial test, with Bayesian inference using Bayes factors for bias detection and quantification. Extensive experiments are conducted on prominent models, including ChatGPT (GPT-3.5-Turbo), DeepSeek-V3, and Llama-3.1-70B, utilizing publicly available datasets such as BBQ, CrowS-Pairs (in both English and French), and Winogender. While the exact Binomial test fails to distinguish between no evidence of bias and evidence of no bias, our results underscore the advantages of Bayes factors, particularly their capacity to quantify evidence for both competing hypotheses and their robustness to small sample size. Additionally, our experiments reveal that the bias behavior of LLMs is largely consistent across the English and French versions of the CrowS-Pairs dataset, with subtle differences likely arising from variations in social norms across linguistic and cultural contexts.

Duke Scholars

Author Lawrence Carin Electrical and Computer Engineering

Published In

Scientific reports

DOI

10.1038/s41598-025-95825-x

EISSN

2045-2322

ISSN

2045-2322

Publication Date

April 2025

Volume

Issue

Start / End Page

12415

Related Subject Headings

Large Language Models
Humans
Bias
Bayes Theorem

Citation

APA

Chicago

ICMJE

MLA

NLM

Si, S., Jiang, X., Su, Q., & Carin, L. (2025). Detecting implicit biases of large language models with Bayesian hypothesis testing. Scientific Reports, 15(1), 12415. https://doi.org/10.1038/s41598-025-95825-x

Si, Shijing, Xiaoming Jiang, Qinliang Su, and Lawrence Carin. “Detecting implicit biases of large language models with Bayesian hypothesis testing.” Scientific Reports 15, no. 1 (April 2025): 12415. https://doi.org/10.1038/s41598-025-95825-x.

Si S, Jiang X, Su Q, Carin L. Detecting implicit biases of large language models with Bayesian hypothesis testing. Scientific reports. 2025 Apr;15(1):12415.

Si, Shijing, et al. “Detecting implicit biases of large language models with Bayesian hypothesis testing.” Scientific Reports, vol. 15, no. 1, Apr. 2025, p. 12415. Epmc, doi:10.1038/s41598-025-95825-x.

Si S, Jiang X, Su Q, Carin L. Detecting implicit biases of large language models with Bayesian hypothesis testing. Scientific reports. 2025 Apr;15(1):12415.

Published In

Scientific reports

DOI

10.1038/s41598-025-95825-x

EISSN

2045-2322

ISSN

2045-2322

Publication Date

April 2025

Volume

Issue

Start / End Page

12415

Related Subject Headings

Large Language Models
Humans
Bias
Bayes Theorem