Scholars@Duke publication: SAMPLE EFFICIENT POLICY GRADIENT METHODS WITH RECURSIVE VARIANCE REDUCTION

SAMPLE EFFICIENT POLICY GRADIENT METHODS WITH RECURSIVE VARIANCE REDUCTION

Publication , Conference

Xu, P; Gao, F; Gu, Q

Published in: 8th International Conference on Learning Representations Iclr 2020

January 1, 2020

Improving the sample efficiency in reinforcement learning has been a longstanding research problem. In this work, we aim to reduce the sample complexity of existing policy gradient methods. We propose a novel policy gradient algorithm called SRVR-PG, which only requires O(1/∊^3/2)¹ episodes to find an ∊- approximate stationary point of the nonconcave performance function J(θ) (i.e., θ such that ∥∇J(θ)∥²2 ≤ ∊). This sample complexity improves the existing result O(1/∊^5/3) for stochastic variance reduced policy gradient algorithms by a factor of O(1/∊^1/6). In addition, we also propose a variant of SRVR-PG with parameter exploration, which explores the initial policy parameter from a prior probability distribution. We conduct numerical experiments on classic control problems in reinforcement learning to validate the performance of our proposed algorithms.

Duke Scholars

Author Pan Xu Biostatistics & Bioinformatics, Division of Translational Bi ...

Published In

8th International Conference on Learning Representations Iclr 2020

Publication Date

January 1, 2020

Citation

APA

Chicago

ICMJE

MLA

NLM

Xu, P., Gao, F., & Gu, Q. (2020). SAMPLE EFFICIENT POLICY GRADIENT METHODS WITH RECURSIVE VARIANCE REDUCTION. In 8th International Conference on Learning Representations Iclr 2020.

Xu, P., F. Gao, and Q. Gu. “SAMPLE EFFICIENT POLICY GRADIENT METHODS WITH RECURSIVE VARIANCE REDUCTION.” In 8th International Conference on Learning Representations Iclr 2020, 2020.

Xu P, Gao F, Gu Q. SAMPLE EFFICIENT POLICY GRADIENT METHODS WITH RECURSIVE VARIANCE REDUCTION. In: 8th International Conference on Learning Representations Iclr 2020. 2020.

Xu, P., et al. “SAMPLE EFFICIENT POLICY GRADIENT METHODS WITH RECURSIVE VARIANCE REDUCTION.” 8th International Conference on Learning Representations Iclr 2020, 2020.

Xu P, Gao F, Gu Q. SAMPLE EFFICIENT POLICY GRADIENT METHODS WITH RECURSIVE VARIANCE REDUCTION. 8th International Conference on Learning Representations Iclr 2020. 2020.

Published In

8th International Conference on Learning Representations Iclr 2020

Publication Date

January 1, 2020