Skip to main content

Thompson Sampling with Less Exploration is Fast and Optimal

Publication ,  Conference
Jin, T; Yang, X; Xiao, X; Xu, P
Published in: Proceedings of Machine Learning Research
January 1, 2023

We propose ϵ-Exploring Thompson Sampling (ϵTS), a modified version of the Thompson Sampling (TS) algorithm (Agrawal & Goyal, 2017) for multi-armed bandits. In ϵ-TS, arms are selected greedily based on empirical mean rewards with probability 1 − ϵ, and based on posterior samples obtained from TS with probability ϵ. Here, ϵ ∈ (0, 1) is a user-defined constant. By reducing exploration, ϵ-TS improves computational efficiency compared to TS while achieving better regret bounds. We establish that ϵ-TS is both minimax optimal and asymptotically optimal for various popular reward distributions, including Gaussian, Bernoulli, Poisson, and Gamma. A key technical advancement in our analysis is the relaxation of the requirement for a stringent anti-concentration bound of the posterior distribution, which was necessary in recent analyses that achieved similar bounds (Jin et al., 2021b; 2022). As a result, ϵ-TS maintains the posterior update structure of TS while minimizing alterations, such as clipping the sampling distribution or solving the inverse of the Kullback-Leibler (KL) divergence between reward distributions, as done in previous work. Furthermore, our algorithm is as easy to implement as TS, but operates significantly faster due to reduced exploration. Empirical evaluations confirm the efficiency and optimality of ϵ-TS.

Duke Scholars

Published In

Proceedings of Machine Learning Research

EISSN

2640-3498

Publication Date

January 1, 2023

Volume

202

Start / End Page

15239 / 15261
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Jin, T., Yang, X., Xiao, X., & Xu, P. (2023). Thompson Sampling with Less Exploration is Fast and Optimal. In Proceedings of Machine Learning Research (Vol. 202, pp. 15239–15261).
Jin, T., X. Yang, X. Xiao, and P. Xu. “Thompson Sampling with Less Exploration is Fast and Optimal.” In Proceedings of Machine Learning Research, 202:15239–61, 2023.
Jin T, Yang X, Xiao X, Xu P. Thompson Sampling with Less Exploration is Fast and Optimal. In: Proceedings of Machine Learning Research. 2023. p. 15239–61.
Jin, T., et al. “Thompson Sampling with Less Exploration is Fast and Optimal.” Proceedings of Machine Learning Research, vol. 202, 2023, pp. 15239–61.
Jin T, Yang X, Xiao X, Xu P. Thompson Sampling with Less Exploration is Fast and Optimal. Proceedings of Machine Learning Research. 2023. p. 15239–15261.

Published In

Proceedings of Machine Learning Research

EISSN

2640-3498

Publication Date

January 1, 2023

Volume

202

Start / End Page

15239 / 15261