Scholars@Duke publication: MOTS: Minimax Optimal Thompson Sampling

MOTS: Minimax Optimal Thompson Sampling

Publication , Conference

Jin, T; Xu, P; Shi, J; Xiao, X; Gu, Q

Published in: Proceedings of Machine Learning Research

January 1, 2021

Thompson sampling is one of the most widely used algorithms for many online decision problems, due to its simplicity in implementation and superior empirical performance over other state-of-the-art methods. Despite its popularity and empirical success, it has remained an open problem whether Thompson sampling can match the minimax lower bound Ω(√KT) for K-armed bandit problems, where T is the total time horizon. In this paper, we solve this long open problem by proposing a variant of Thompson sampling called MOTS that adaptively clips the sampling instance of the chosen arm at each time step. We prove that this simple variant of Thompson sampling_achieves the minimax optimal regret bound O(√KT) for finite time horizon T, as well as the asymptotic optimal regret bound for Gaussian rewards when T approaches infinity. To our knowledge, MOTS is the first Thompson sampling type algorithm that achieves the minimax optimality for multi-armed bandit problems.

Duke Scholars

Author Pan Xu Biostatistics & Bioinformatics, Division of Translational Bi ...

Published In

Proceedings of Machine Learning Research

EISSN

2640-3498

Publication Date

January 1, 2021

Volume

139

Start / End Page

5074 / 5083

Citation

APA

Chicago

ICMJE

MLA

NLM

Jin, T., Xu, P., Shi, J., Xiao, X., & Gu, Q. (2021). MOTS: Minimax Optimal Thompson Sampling. In Proceedings of Machine Learning Research (Vol. 139, pp. 5074–5083).

Jin, T., P. Xu, J. Shi, X. Xiao, and Q. Gu. “MOTS: Minimax Optimal Thompson Sampling.” In Proceedings of Machine Learning Research, 139:5074–83, 2021.

Jin T, Xu P, Shi J, Xiao X, Gu Q. MOTS: Minimax Optimal Thompson Sampling. In: Proceedings of Machine Learning Research. 2021. p. 5074–83.

Jin, T., et al. “MOTS: Minimax Optimal Thompson Sampling.” Proceedings of Machine Learning Research, vol. 139, 2021, pp. 5074–83.

Jin T, Xu P, Shi J, Xiao X, Gu Q. MOTS: Minimax Optimal Thompson Sampling. Proceedings of Machine Learning Research. 2021. p. 5074–5083.

Published In

Proceedings of Machine Learning Research

EISSN

2640-3498

Publication Date

January 1, 2021

Volume

139

Start / End Page

5074 / 5083