Incremental least squares policy iteration for POMDPs

Published

Journal Article

We present a new algorithm, called incremental least squares policy iteration (ILSPI), for finding the infinite-horizon stationary policy for partially observable Markov decision processes (POMDPs). The ILSPI algorithm computes a basis representation of the infinite-horizon value function by minimizing the square of Bellman residual and performs policy improvement in reachable belief states. A number of optimal basis functions are determined by the algorithm to minimize the Bellman residual incrementally, via efficient computations. We show that, by using optimally determined basis functions, the policy can be improved successively on a set of most probable belief points sampled from the reachable belief set. As the ILSPI is based on belief sample points, it represents a point-based policy iteration method. The results on four benchmark problems show that the ILSPI compares competitively to its value-iteration counterparts in terms of both performance and computational efficiency. Copyright © 2006, American Association for Artificial Intelligence (www.aaai.org). All rights reserved.

Duke Authors

Cited Authors

  • Li, H; Liao, X; Carin, L

Published Date

  • November 13, 2006

Published In

  • Proceedings of the National Conference on Artificial Intelligence

Volume / Issue

  • 2 /

Start / End Page

  • 1167 - 1172

Citation Source

  • Scopus