Skip to main content

Learning to explore and exploit in POMDPs

Publication ,  Journal Article
Cai, C; Liao, X; Carin, L
Published in: Advances in Neural Information Processing Systems 22 - Proceedings of the 2009 Conference
January 1, 2009

A fundamental objective in reinforcement learning is the maintenance of a proper balance between exploration and exploitation. This problem becomes more challenging when the agent can only partially observe the states of its environment. In this paper we propose a dual-policy method for jointly learning the agent behavior and the balance between exploration exploitation, in partially observable environments. The method subsumes traditional exploration, in which the agent takes actions to gather information about the environment, and active learning, in which the agent queries an oracle for optimal actions (with an associated cost for employing the oracle). The form of the employed exploration is dictated by the specific problem. Theoretical guarantees are provided concerning the optimality of the balancing of exploration and exploitation. The effectiveness of the method is demonstrated by experimental results on benchmark problems.

Duke Scholars

Published In

Advances in Neural Information Processing Systems 22 - Proceedings of the 2009 Conference

Publication Date

January 1, 2009

Start / End Page

198 / 206
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Cai, C., Liao, X., & Carin, L. (2009). Learning to explore and exploit in POMDPs. Advances in Neural Information Processing Systems 22 - Proceedings of the 2009 Conference, 198–206.
Cai, C., X. Liao, and L. Carin. “Learning to explore and exploit in POMDPs.” Advances in Neural Information Processing Systems 22 - Proceedings of the 2009 Conference, January 1, 2009, 198–206.
Cai C, Liao X, Carin L. Learning to explore and exploit in POMDPs. Advances in Neural Information Processing Systems 22 - Proceedings of the 2009 Conference. 2009 Jan 1;198–206.
Cai, C., et al. “Learning to explore and exploit in POMDPs.” Advances in Neural Information Processing Systems 22 - Proceedings of the 2009 Conference, Jan. 2009, pp. 198–206.
Cai C, Liao X, Carin L. Learning to explore and exploit in POMDPs. Advances in Neural Information Processing Systems 22 - Proceedings of the 2009 Conference. 2009 Jan 1;198–206.

Published In

Advances in Neural Information Processing Systems 22 - Proceedings of the 2009 Conference

Publication Date

January 1, 2009

Start / End Page

198 / 206