The infinite regionalized policy representation


Journal Article

We introduce the infinite regionalized policy presentation (iRPR), as a nonparametric policy for reinforcement learning in partially observable Markov decision processes (POMDPs). The iRPR assumes an unbounded set of decision states a priori, and infers the number of states to represent the policy given the experiences. We propose algorithms for learning the number of decision states while maintaining a proper balance between exploration and exploitation. Convergence analysis is provided, along with performance evaluations on benchmark problems. Copyright 2011 by the author(s)/owner(s).

Duke Authors

Cited Authors

  • Liu, M; Liao, X; Carin, L

Published Date

  • October 7, 2011

Published In

  • Proceedings of the 28th International Conference on Machine Learning, Icml 2011

Start / End Page

  • 769 - 776

Citation Source

  • Scopus