Scholars@Duke publication: Solving DEC-POMDPs by expectation maximization of value functions

Solving DEC-POMDPs by expectation maximization of value functions

Publication , Conference

Song, Z; Liao, X; Carin, L

Published in: Aaai Spring Symposium Technical Report

January 1, 2016

We present a new algorithm called PIEM to approximately solve for the policy of an infinite-horizon decentralized partially observable Markov decision process (DEC-POMDP). The algorithm uses expectation maximization (EM) only in the step of policy improvement, with policy evaluation achieved by solving the Bellman's equation in terms of finite state controllers (FSCs). This marks a key distinction of PIEM from the previous EM algorithm of (Kumar and Zilberstein, 2010), i.e., PIEM directly operates on a DEC-POMDP without transforming it into a mixture of dynamic Bayes nets. Thus, PIEM precisely maximizes the value function, avoiding complicated forward/backward message passing arid the corresponding computational and memory cost. To overcome local optima, we follow (Pa-jarinen and Peltonen, 2011) to solve the DEC-POMDP for a finite length horizon and use the resulting policy graph to initialize the FSCs. We solve the finite-horizon problem using a modified point-based policy generation (PBPG) algorithm, in which a closed-form solution is provided which was previously found by linear programming in the original PBPG. Experimental results on benchmark problems show that the proposed algorithms compare favorably to state-of-the-art methods.

Duke Scholars

Author Lawrence Carin Electrical and Computer Engineering

Published In

Aaai Spring Symposium Technical Report

Publication Date

January 1, 2016

Volume

SS-16-01 - 07

Start / End Page

68 / 76

Citation

APA

Chicago

ICMJE

MLA

NLM

Song, Z., Liao, X., & Carin, L. (2016). Solving DEC-POMDPs by expectation maximization of value functions. In Aaai Spring Symposium Technical Report (Vol. SS-16-01-07, pp. 68–76).

Song, Z., X. Liao, and L. Carin. “Solving DEC-POMDPs by expectation maximization of value functions.” In Aaai Spring Symposium Technical Report, SS-16-01-07:68–76, 2016.

Song Z, Liao X, Carin L. Solving DEC-POMDPs by expectation maximization of value functions. In: Aaai Spring Symposium Technical Report. 2016. p. 68–76.

Song, Z., et al. “Solving DEC-POMDPs by expectation maximization of value functions.” Aaai Spring Symposium Technical Report, vol. SS-16-01-07, 2016, pp. 68–76.

Song Z, Liao X, Carin L. Solving DEC-POMDPs by expectation maximization of value functions. Aaai Spring Symposium Technical Report. 2016. p. 68–76.

Published In

Aaai Spring Symposium Technical Report

Publication Date

January 1, 2016

Volume

SS-16-01 - 07

Start / End Page

68 / 76