Scholars@Duke publication: Almost Optimal Anytime Algorithm for Batched Multi-Armed Bandits

Almost Optimal Anytime Algorithm for Batched Multi-Armed Bandits

Publication , Conference

Jin, T; Tang, J; Xu, P; Huang, K; Xiao, X; Gu, Q

Published in: Proceedings of Machine Learning Research

January 1, 2021

In batched multi-armed bandit problems, the learner can adaptively pull arms and adjust strategy in batches. In many real applications, not only the regret but also the batch complexity need to be optimized. Existing batched bandit algorithms usually assume that the time horizon T is known in advance. However, many applications involve an unpredictable stopping time. In this paper, we study the anytime batched multi-armed bandit problem. We propose an anytime algorithm that achieves the asymptotically optimal regret for exponential families of reward distributions with O(log log T · ilogα(T))1 batches, where α ∈ OT(1). Moreover, we prove that for any constant c > 0, no algorithm can achieve the asymptotically optimal regret within clog log T batches.

Duke Scholars

Author Pan Xu Biostatistics & Bioinformatics, Division of Integrative Geno ...

Published In

Proceedings of Machine Learning Research

EISSN

2640-3498

Publication Date

January 1, 2021

Volume

139

Start / End Page

5065 / 5073

Citation

APA

Chicago

ICMJE

MLA

NLM

Jin, T., Tang, J., Xu, P., Huang, K., Xiao, X., & Gu, Q. (2021). Almost Optimal Anytime Algorithm for Batched Multi-Armed Bandits. In Proceedings of Machine Learning Research (Vol. 139, pp. 5065–5073).

Jin, T., J. Tang, P. Xu, K. Huang, X. Xiao, and Q. Gu. “Almost Optimal Anytime Algorithm for Batched Multi-Armed Bandits.” In Proceedings of Machine Learning Research, 139:5065–73, 2021.

Jin T, Tang J, Xu P, Huang K, Xiao X, Gu Q. Almost Optimal Anytime Algorithm for Batched Multi-Armed Bandits. In: Proceedings of Machine Learning Research. 2021. p. 5065–73.

Jin, T., et al. “Almost Optimal Anytime Algorithm for Batched Multi-Armed Bandits.” Proceedings of Machine Learning Research, vol. 139, 2021, pp. 5065–73.

Jin T, Tang J, Xu P, Huang K, Xiao X, Gu Q. Almost Optimal Anytime Algorithm for Batched Multi-Armed Bandits. Proceedings of Machine Learning Research. 2021. p. 5065–5073.

Published In

Proceedings of Machine Learning Research

EISSN

2640-3498

Publication Date

January 1, 2021

Volume

139

Start / End Page

5065 / 5073