Scholars@Duke publication: Approximation algorithms for partial-information based stochastic control with Markovian rewards

Approximation algorithms for partial-information based stochastic control with Markovian rewards

Publication , Conference

Guha, S; Munagala, K

Published in: Proceedings - Annual IEEE Symposium on Foundations of Computer Science, FOCS

December 1, 2007

We consider a variant of the classic multi-armed bandit problem (MAB), which we call FEEDBACK MAB, where the reward obtained by playing each of n independent arms varies according to an underlying on/off Markov process with known parameters. The evolution of the Markov chain happens irrespective of whether the arm is played, and furthermore, the exact state of the Markov chain is only revealed to the player when the arm is played and the reward observed. At most one arm (or in general, M arms) can be played any time step. The goal is to design a policy for playing the arms in order to maximize the infinite horizon time average expected reward. This problem is an instance of a Partially Observable Markov Decision Process (POMDP), and a special case of the notoriously intractable "rest-less bandit" problem. Unlike the stochastic MAB problem, the FEEDBACK MAB problem does not admit to greedy index-based optimal policies. The state of the system at any time step encodes the beliefs about the states of different arms, and the policy decisions change these beliefs - this aspect complicates the design and analysis of simple algorithms. We design a constant factor approximation to the FEEDBACK MAB problem by solving and rounding a natural LP relaxation to this problem. As far as we are aware, this is the first approximation algorithm for a POMDP problem. © 2007 IEEE.

Duke Scholars

Author Kamesh Munagala Computer Science

Published In

Proceedings - Annual IEEE Symposium on Foundations of Computer Science, FOCS

DOI

10.1109/FOCS.2007.4389518

ISSN

0272-5428

ISBN

9780769530109

Publication Date

December 1, 2007

Start / End Page

483 / 493

Citation

APA

Chicago

ICMJE

MLA

NLM

Guha, S., & Munagala, K. (2007). Approximation algorithms for partial-information based stochastic control with Markovian rewards. In Proceedings - Annual IEEE Symposium on Foundations of Computer Science, FOCS (pp. 483–493). https://doi.org/10.1109/FOCS.2007.4389518

Guha, S., and K. Munagala. “Approximation algorithms for partial-information based stochastic control with Markovian rewards.” In Proceedings - Annual IEEE Symposium on Foundations of Computer Science, FOCS, 483–93, 2007. https://doi.org/10.1109/FOCS.2007.4389518.

Guha S, Munagala K. Approximation algorithms for partial-information based stochastic control with Markovian rewards. In: Proceedings - Annual IEEE Symposium on Foundations of Computer Science, FOCS. 2007. p. 483–93.

Guha, S., and K. Munagala. “Approximation algorithms for partial-information based stochastic control with Markovian rewards.” Proceedings - Annual IEEE Symposium on Foundations of Computer Science, FOCS, 2007, pp. 483–93. Scopus, doi:10.1109/FOCS.2007.4389518.

Guha S, Munagala K. Approximation algorithms for partial-information based stochastic control with Markovian rewards. Proceedings - Annual IEEE Symposium on Foundations of Computer Science, FOCS. 2007. p. 483–493.

Published In

Proceedings - Annual IEEE Symposium on Foundations of Computer Science, FOCS

DOI

10.1109/FOCS.2007.4389518

ISSN

0272-5428

ISBN

9780769530109

Publication Date

December 1, 2007

Start / End Page

483 / 493