The infinite regionalized policy representation
Publication
, Journal Article
Liu, M; Liao, X; Carin, L
Published in: Proceedings of the 28th International Conference on Machine Learning, ICML 2011
October 7, 2011
We introduce the infinite regionalized policy presentation (iRPR), as a nonparametric policy for reinforcement learning in partially observable Markov decision processes (POMDPs). The iRPR assumes an unbounded set of decision states a priori, and infers the number of states to represent the policy given the experiences. We propose algorithms for learning the number of decision states while maintaining a proper balance between exploration and exploitation. Convergence analysis is provided, along with performance evaluations on benchmark problems. Copyright 2011 by the author(s)/owner(s).
Duke Scholars
Published In
Proceedings of the 28th International Conference on Machine Learning, ICML 2011
Publication Date
October 7, 2011
Start / End Page
769 / 776
Citation
APA
Chicago
ICMJE
MLA
NLM
Liu, M., Liao, X., & Carin, L. (2011). The infinite regionalized policy representation. Proceedings of the 28th International Conference on Machine Learning, ICML 2011, 769–776.
Liu, M., X. Liao, and L. Carin. “The infinite regionalized policy representation.” Proceedings of the 28th International Conference on Machine Learning, ICML 2011, October 7, 2011, 769–76.
Liu M, Liao X, Carin L. The infinite regionalized policy representation. Proceedings of the 28th International Conference on Machine Learning, ICML 2011. 2011 Oct 7;769–76.
Liu, M., et al. “The infinite regionalized policy representation.” Proceedings of the 28th International Conference on Machine Learning, ICML 2011, Oct. 2011, pp. 769–76.
Liu M, Liao X, Carin L. The infinite regionalized policy representation. Proceedings of the 28th International Conference on Machine Learning, ICML 2011. 2011 Oct 7;769–776.
Published In
Proceedings of the 28th International Conference on Machine Learning, ICML 2011
Publication Date
October 7, 2011
Start / End Page
769 / 776