Scholars@Duke publication: Point-based policy iteration

Point-based policy iteration

Publication , Journal Article

Shihao, J; Parr, R; Hui, L; Xuejun, L; Carin, L

Published in: Proceedings of the National Conference on Artificial Intelligence

November 28, 2007

We describe a point-based policy iteration (PBPI) algorithm for infinite-horizon POMDPs. PBPI replaces the exact policy improvement step of Hansen's policy iteration with point-based value iteration (PBVI). Despite being an approximate algorithm, PBPI is monotonie: At each iteration before convergence, PBPI produces a policy for which the values increase for at least one of a finite set of initial belief states, and decrease for none of these states. In contrast, PBVI cannot guarantee monotonie improvement of the value function or the policy. In practice PBPI generally needs a lower density of point coverage in the simplex and tends to produce superior policies with less computation. Experiments on several benchmark problems (up to 12,545 states) demonstrate the scalability and robustness of the PBPI algorithm. Copyright ©2007, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.

Duke Scholars

Author Lawrence Carin Electrical and Computer Engineering

Published In

Proceedings of the National Conference on Artificial Intelligence

Publication Date

November 28, 2007

Volume

Start / End Page

1243 / 1249

Citation

APA

Chicago

ICMJE

MLA

NLM

Shihao, J., Parr, R., Hui, L., Xuejun, L., & Carin, L. (2007). Point-based policy iteration. Proceedings of the National Conference on Artificial Intelligence, 2, 1243–1249.

Shihao, J., R. Parr, L. Hui, L. Xuejun, and L. Carin. “Point-based policy iteration.” Proceedings of the National Conference on Artificial Intelligence 2 (November 28, 2007): 1243–49.

Shihao J, Parr R, Hui L, Xuejun L, Carin L. Point-based policy iteration. Proceedings of the National Conference on Artificial Intelligence. 2007 Nov 28;2:1243–9.

Shihao, J., et al. “Point-based policy iteration.” Proceedings of the National Conference on Artificial Intelligence, vol. 2, Nov. 2007, pp. 1243–49.

Shihao J, Parr R, Hui L, Xuejun L, Carin L. Point-based policy iteration. Proceedings of the National Conference on Artificial Intelligence. 2007 Nov 28;2:1243–1249.

Published In

Proceedings of the National Conference on Artificial Intelligence

Publication Date

November 28, 2007

Volume

Start / End Page

1243 / 1249