Skip to main content

Nonlinear sequential accepts and rejects for identification of top arms in stochastic bandits

Publication ,  Journal Article
Shahrampour, S; Tarokh, V
Published in: 55th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2017
July 1, 2017

We address the M-best-Arm identification problem in multi-Armed bandits. A player has a limited budget to explore K arms (M < K), and once pulled, each arm yields a reward drawn (independently) from a fixed, unknown distribution. The goal is to find the top M arms in the sense of expected reward. We develop an algorithm which proceeds in rounds to deactivate arms iteratively. At each round, the budget is divided by a nonlinear function of remaining arms, and the arms are pulled correspondingly. Based on a decision rule, the deactivated arm at each round may be accepted or rejected. The algorithm outputs the accepted arms that should ideally be the top M arms. We characterize the decay rate of the misidentification probability and establish that the nonlinear budget allocation proves to be useful for different problem environments (described by the number of competitive arms). We provide comprehensive numerical experiments showing that our algorithm outperforms the state-of-The-Art using suitable nonlinearity.

Duke Scholars

Published In

55th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2017

DOI

Publication Date

July 1, 2017

Volume

2018-January

Start / End Page

228 / 235
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Shahrampour, S., & Tarokh, V. (2017). Nonlinear sequential accepts and rejects for identification of top arms in stochastic bandits. 55th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2017, 2018-January, 228–235. https://doi.org/10.1109/ALLERTON.2017.8262742
Shahrampour, S., and V. Tarokh. “Nonlinear sequential accepts and rejects for identification of top arms in stochastic bandits.” 55th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2017 2018-January (July 1, 2017): 228–35. https://doi.org/10.1109/ALLERTON.2017.8262742.
Shahrampour S, Tarokh V. Nonlinear sequential accepts and rejects for identification of top arms in stochastic bandits. 55th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2017. 2017 Jul 1;2018-January:228–35.
Shahrampour, S., and V. Tarokh. “Nonlinear sequential accepts and rejects for identification of top arms in stochastic bandits.” 55th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2017, vol. 2018-January, July 2017, pp. 228–35. Scopus, doi:10.1109/ALLERTON.2017.8262742.
Shahrampour S, Tarokh V. Nonlinear sequential accepts and rejects for identification of top arms in stochastic bandits. 55th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2017. 2017 Jul 1;2018-January:228–235.

Published In

55th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2017

DOI

Publication Date

July 1, 2017

Volume

2018-January

Start / End Page

228 / 235