Skip to main content

Incremental least squares policy iteration for POMDPs

Publication ,  Journal Article
Li, H; Liao, X; Carin, L
Published in: Proceedings of the National Conference on Artificial Intelligence
November 13, 2006

We present a new algorithm, called incremental least squares policy iteration (ILSPI), for finding the infinite-horizon stationary policy for partially observable Markov decision processes (POMDPs). The ILSPI algorithm computes a basis representation of the infinite-horizon value function by minimizing the square of Bellman residual and performs policy improvement in reachable belief states. A number of optimal basis functions are determined by the algorithm to minimize the Bellman residual incrementally, via efficient computations. We show that, by using optimally determined basis functions, the policy can be improved successively on a set of most probable belief points sampled from the reachable belief set. As the ILSPI is based on belief sample points, it represents a point-based policy iteration method. The results on four benchmark problems show that the ILSPI compares competitively to its value-iteration counterparts in terms of both performance and computational efficiency. Copyright © 2006, American Association for Artificial Intelligence (www.aaai.org). All rights reserved.

Duke Scholars

Published In

Proceedings of the National Conference on Artificial Intelligence

Publication Date

November 13, 2006

Volume

2

Start / End Page

1167 / 1172
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Li, H., Liao, X., & Carin, L. (2006). Incremental least squares policy iteration for POMDPs. Proceedings of the National Conference on Artificial Intelligence, 2, 1167–1172.
Li, H., X. Liao, and L. Carin. “Incremental least squares policy iteration for POMDPs.” Proceedings of the National Conference on Artificial Intelligence 2 (November 13, 2006): 1167–72.
Li H, Liao X, Carin L. Incremental least squares policy iteration for POMDPs. Proceedings of the National Conference on Artificial Intelligence. 2006 Nov 13;2:1167–72.
Li, H., et al. “Incremental least squares policy iteration for POMDPs.” Proceedings of the National Conference on Artificial Intelligence, vol. 2, Nov. 2006, pp. 1167–72.
Li H, Liao X, Carin L. Incremental least squares policy iteration for POMDPs. Proceedings of the National Conference on Artificial Intelligence. 2006 Nov 13;2:1167–1172.

Published In

Proceedings of the National Conference on Artificial Intelligence

Publication Date

November 13, 2006

Volume

2

Start / End Page

1167 / 1172