Scholars@Duke publication: Online expectation maximization for reinforcement learning in POMDPs

Online expectation maximization for reinforcement learning in POMDPs

Publication , Journal Article

Liu, M; Liao, X; Carin, L

Published in: IJCAI International Joint Conference on Artificial Intelligence

December 1, 2013

We present online nested expectation maximization for model-free reinforcement learning in a POMDP. The algorithm evaluates the policy only in the current learning episode, discarding the episode after the evaluation and memorizing the sufficient statistic, from which the policy is computed in closedform. As a result, the online algorithm has a time complexity O (n) and a memory complexity O(1), compared to O (n2) and O(n) for the corresponding batch-mode algorithm, where n is the number of learning episodes. The online algorithm, which has a provable convergence, is demonstrated on five benchmark POMDP problems.

Duke Scholars

Author Lawrence Carin Electrical and Computer Engineering

Published In

IJCAI International Joint Conference on Artificial Intelligence

ISSN

1045-0823

Publication Date

December 1, 2013

Start / End Page

1501 / 1507

Citation

APA

Chicago

ICMJE

MLA

NLM

Liu, M., Liao, X., & Carin, L. (2013). Online expectation maximization for reinforcement learning in POMDPs. IJCAI International Joint Conference on Artificial Intelligence, 1501–1507.

Liu, M., X. Liao, and L. Carin. “Online expectation maximization for reinforcement learning in POMDPs.” IJCAI International Joint Conference on Artificial Intelligence, December 1, 2013, 1501–7.

Liu M, Liao X, Carin L. Online expectation maximization for reinforcement learning in POMDPs. IJCAI International Joint Conference on Artificial Intelligence. 2013 Dec 1;1501–7.

Liu, M., et al. “Online expectation maximization for reinforcement learning in POMDPs.” IJCAI International Joint Conference on Artificial Intelligence, Dec. 2013, pp. 1501–07.

Liu M, Liao X, Carin L. Online expectation maximization for reinforcement learning in POMDPs. IJCAI International Joint Conference on Artificial Intelligence. 2013 Dec 1;1501–1507.

Published In

IJCAI International Joint Conference on Artificial Intelligence

ISSN

1045-0823

Publication Date

December 1, 2013

Start / End Page

1501 / 1507