Scholars@Duke publication: Region-based value iteration for partially observable Markov decision processes

Region-based value iteration for partially observable Markov decision processes

Publication , Journal Article

Li, H; Liao, X; Carin, L

Published in: ICML 2006 - Proceedings of the 23rd International Conference on Machine Learning

October 6, 2006

An approximate region-based value iteration (RBVI) algorithm is proposed to find the optimal policy for a partially observable Markov decision process (POMDP). The proposed RBVI approximates the true polyhedral partition of the belief simplex with an ellipsoidal partition, such that the optimal value function is linear in each of the ellipsoidal regions. The position and shape of each region, as well as the gradient (alpha-vector) of the optimal value function in the region, are parameterized explicitly, and are estimated via efficient expectation maximization (EM) and variational Bayesian EM (VBEM), based on a set of selected sample belief points. The RBVI maintains a much smaller number of alpha-vectors than point-based methods and yields a more parsimonious representation that approximates the true value function in the maximum likelihood (ML) sense. The results on benchmark problems show that the proposed RBVI is comparable in performance to state-of-the-art algorithms, despite of the small number of alpha-vectors that are used.

Duke Scholars

Author Lawrence Carin Electrical and Computer Engineering

Published In

ICML 2006 - Proceedings of the 23rd International Conference on Machine Learning

Publication Date

October 6, 2006

Volume

2006

Start / End Page

561 / 568

Citation

APA

Chicago

ICMJE

MLA

NLM

Li, H., Liao, X., & Carin, L. (2006). Region-based value iteration for partially observable Markov decision processes. ICML 2006 - Proceedings of the 23rd International Conference on Machine Learning, 2006, 561–568.

Li, H., X. Liao, and L. Carin. “Region-based value iteration for partially observable Markov decision processes.” ICML 2006 - Proceedings of the 23rd International Conference on Machine Learning 2006 (October 6, 2006): 561–68.

Li H, Liao X, Carin L. Region-based value iteration for partially observable Markov decision processes. ICML 2006 - Proceedings of the 23rd International Conference on Machine Learning. 2006 Oct 6;2006:561–8.

Li, H., et al. “Region-based value iteration for partially observable Markov decision processes.” ICML 2006 - Proceedings of the 23rd International Conference on Machine Learning, vol. 2006, Oct. 2006, pp. 561–68.

Published In

ICML 2006 - Proceedings of the 23rd International Conference on Machine Learning

Publication Date

October 6, 2006

Volume

2006

Start / End Page

561 / 568