Scholars@Duke publication: Incremental least squares policy iteration for POMDPs

Incremental least squares policy iteration for POMDPs

Publication , Journal Article

Li, H; Liao, X; Carin, L

Published in: Proceedings of the National Conference on Artificial Intelligence

November 13, 2006

We present a new algorithm, called incremental least squares policy iteration (ILSPI), for finding the infinite-horizon stationary policy for partially observable Markov decision processes (POMDPs). The ILSPI algorithm computes a basis representation of the infinite-horizon value function by minimizing the square of Bellman residual and performs policy improvement in reachable belief states. A number of optimal basis functions are determined by the algorithm to minimize the Bellman residual incrementally, via efficient computations. We show that, by using optimally determined basis functions, the policy can be improved successively on a set of most probable belief points sampled from the reachable belief set. As the ILSPI is based on belief sample points, it represents a point-based policy iteration method. The results on four benchmark problems show that the ILSPI compares competitively to its value-iteration counterparts in terms of both performance and computational efficiency. Copyright © 2006, American Association for Artificial Intelligence (www.aaai.org). All rights reserved.

Duke Scholars

Author Lawrence Carin Electrical and Computer Engineering

Published In

Proceedings of the National Conference on Artificial Intelligence

Publication Date

November 13, 2006

Volume

Start / End Page

1167 / 1172

Citation

APA

Chicago

ICMJE

MLA

NLM

Li, H., Liao, X., & Carin, L. (2006). Incremental least squares policy iteration for POMDPs. Proceedings of the National Conference on Artificial Intelligence, 2, 1167–1172.

Li, H., X. Liao, and L. Carin. “Incremental least squares policy iteration for POMDPs.” Proceedings of the National Conference on Artificial Intelligence 2 (November 13, 2006): 1167–72.

Li H, Liao X, Carin L. Incremental least squares policy iteration for POMDPs. Proceedings of the National Conference on Artificial Intelligence. 2006 Nov 13;2:1167–72.

Li, H., et al. “Incremental least squares policy iteration for POMDPs.” Proceedings of the National Conference on Artificial Intelligence, vol. 2, Nov. 2006, pp. 1167–72.

Li H, Liao X, Carin L. Incremental least squares policy iteration for POMDPs. Proceedings of the National Conference on Artificial Intelligence. 2006 Nov 13;2:1167–1172.

Published In

Proceedings of the National Conference on Artificial Intelligence

Publication Date

November 13, 2006

Volume

Start / End Page

1167 / 1172