Scholars@Duke publication: Finite-Time Frequentist Regret Bounds of Multi-Agent Thompson Sampling on Sparse Hypergraphs

Finite-Time Frequentist Regret Bounds of Multi-Agent Thompson Sampling on Sparse Hypergraphs

Publication , Conference

Jin, T; Hsu, HL; Chang, W; Xu, P

Published in: Proceedings of the Aaai Conference on Artificial Intelligence

March 25, 2024

We study the multi-agent multi-armed bandit (MAMAB) problem, where agents are factored into overlapping groups. Each group represents a hyperedge, forming a hypergraph over the agents. At each round of interaction, the learner pulls a joint arm (composed of individual arms for each agent) and receives a reward according to the hypergraph structure. Specifically, we assume there is a local reward for each hyperedge, and the reward of the joint arm is the sum of these local rewards. Previous work introduced the multi-agent Thompson sampling (MATS) algorithm and derived a Bayesian regret bound. However, it remains an open problem how to derive a frequentist regret bound for Thompson sampling in this multi-agent setting. To address these issues, we propose an efficient variant of MATS, the epsilon-exploring Multi-Agent Thompson Sampling (ϵ-MATS) algorithm, which performs MATS exploration with probability epsilon while adopts a greedy policy otherwise. We prove that ϵ-MATS achieves a worst-case frequentist regret bound that is sublinear in both the time horizon and the local arm size. We also derive a lower bound for this setting, which implies our frequentist regret upper bound is optimal up to constant and logarithm terms, when the hypergraph is sufficiently sparse. Thorough experiments on standard MAMAB problems demonstrate the superior performance and the improved computational efficiency of ϵ-MATS compared with existing algorithms in the same setting.

Duke Scholars

Author Hao-Lun Hsu

Author Pan Xu Biostatistics & Bioinformatics, Division of Translational Bi ...

Published In

Proceedings of the Aaai Conference on Artificial Intelligence

DOI

10.1609/aaai.v38i11.29193

EISSN

2374-3468

ISSN

2159-5399

Publication Date

March 25, 2024

Volume

Issue

Start / End Page

12956 / 12964

Citation

APA

Chicago

ICMJE

MLA

NLM

Jin, T., Hsu, H. L., Chang, W., & Xu, P. (2024). Finite-Time Frequentist Regret Bounds of Multi-Agent Thompson Sampling on Sparse Hypergraphs. In Proceedings of the Aaai Conference on Artificial Intelligence (Vol. 38, pp. 12956–12964). https://doi.org/10.1609/aaai.v38i11.29193

Jin, T., H. L. Hsu, W. Chang, and P. Xu. “Finite-Time Frequentist Regret Bounds of Multi-Agent Thompson Sampling on Sparse Hypergraphs.” In Proceedings of the Aaai Conference on Artificial Intelligence, 38:12956–64, 2024. https://doi.org/10.1609/aaai.v38i11.29193.

Jin T, Hsu HL, Chang W, Xu P. Finite-Time Frequentist Regret Bounds of Multi-Agent Thompson Sampling on Sparse Hypergraphs. In: Proceedings of the Aaai Conference on Artificial Intelligence. 2024. p. 12956–64.

Jin, T., et al. “Finite-Time Frequentist Regret Bounds of Multi-Agent Thompson Sampling on Sparse Hypergraphs.” Proceedings of the Aaai Conference on Artificial Intelligence, vol. 38, no. 11, 2024, pp. 12956–64. Scopus, doi:10.1609/aaai.v38i11.29193.

Jin T, Hsu HL, Chang W, Xu P. Finite-Time Frequentist Regret Bounds of Multi-Agent Thompson Sampling on Sparse Hypergraphs. Proceedings of the Aaai Conference on Artificial Intelligence. 2024. p. 12956–12964.

Published In

Proceedings of the Aaai Conference on Artificial Intelligence

DOI

10.1609/aaai.v38i11.29193

EISSN

2374-3468

ISSN

2159-5399

Publication Date

March 25, 2024

Volume

Issue

Start / End Page

12956 / 12964