Skip to main content

Approximate indexability and bandit problems with concave rewards and delayed feedback

Publication ,  Conference
Guha, S; Munagala, K
Published in: Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics
October 15, 2013

We consider two stochastic multi-armed bandit problems in this paper in the Bayesian setting. In the first problem the accrued reward in a step is a concave function (such as the maximum) of the observed values of the arms played in that step. In the second problem, the observed value from a play of arm i is revealed after δi steps. Both of these problems have been considered in the bandit literature but no solutions with provably good performance guarantees are known over short horizons. The two problems are similar in the sense that the reward (for the first) or the available information (for the second) derived from an arm is not a function of just the current play of that arm. This interdependence between arms renders most existing analysis techniques inapplicable. A fundamental question in regard to optimization in the bandit setting is indexability, i.e., the existence of near optimal index policies. Index policies in these contexts correspond to policies over suitable single arm state spaces which are combined into a global policy. They are extremely desirable due to their simplicity and perceived robustness, but standard index policies do not provide any guarantees for these two problems. We construct O(1) approximate (near) index policies in polynomial time for both problems. The analysis identifies a suitable subset of states for each arm such that index policies that focus on only those subsets are O(1)-approximate. © 2013 Springer-Verlag.

Duke Scholars

Published In

Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics

DOI

EISSN

1611-3349

ISSN

0302-9743

Publication Date

October 15, 2013

Volume

8096 LNCS

Start / End Page

189 / 204

Related Subject Headings

  • Artificial Intelligence & Image Processing
  • 46 Information and computing sciences
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Guha, S., & Munagala, K. (2013). Approximate indexability and bandit problems with concave rewards and delayed feedback. In Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics (Vol. 8096 LNCS, pp. 189–204). https://doi.org/10.1007/978-3-642-40328-6_14
Guha, S., and K. Munagala. “Approximate indexability and bandit problems with concave rewards and delayed feedback.” In Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 8096 LNCS:189–204, 2013. https://doi.org/10.1007/978-3-642-40328-6_14.
Guha S, Munagala K. Approximate indexability and bandit problems with concave rewards and delayed feedback. In: Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics. 2013. p. 189–204.
Guha, S., and K. Munagala. “Approximate indexability and bandit problems with concave rewards and delayed feedback.” Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, vol. 8096 LNCS, 2013, pp. 189–204. Scopus, doi:10.1007/978-3-642-40328-6_14.
Guha S, Munagala K. Approximate indexability and bandit problems with concave rewards and delayed feedback. Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics. 2013. p. 189–204.

Published In

Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics

DOI

EISSN

1611-3349

ISSN

0302-9743

Publication Date

October 15, 2013

Volume

8096 LNCS

Start / End Page

189 / 204

Related Subject Headings

  • Artificial Intelligence & Image Processing
  • 46 Information and computing sciences