Scholars@Duke publication: A finite-time analysis of two time-scale actor-critic methods

A finite-time analysis of two time-scale actor-critic methods

Publication , Conference

Wu, Y; Zhang, W; Xu, P; Gu, Q

Published in: Advances in Neural Information Processing Systems

January 1, 2020

Actor-critic (AC) methods have exhibited great empirical success compared with other reinforcement learning algorithms, where the actor uses the policy gradient to improve the learning policy and the critic uses temporal difference learning to estimate the policy gradient. Under the two time-scale learning rate schedule, the asymptotic convergence of AC has been well studied in the literature. However, the non-asymptotic convergence and finite sample complexity of actor-critic methods are largely open. In this work, we provide a non-asymptotic analysis for two timescale actor-critic methods under non-i.i.d. setting. We prove that the actor-critic method is guaranteed to find a first-order stationary point (i.e., k?J(?)k22 = e) of the non-concave performance function J(?), with Oe(e-2.5) sample complexity. To the best of our knowledge, this is the first work providing finite-time analysis and sample complexity bound for two time-scale actor-critic methods.

Duke Scholars

Author Pan Xu Biostatistics & Bioinformatics, Division of Integrative Geno ...

Published In

Advances in Neural Information Processing Systems

ISSN

1049-5258

Publication Date

January 1, 2020

Volume

2020-December

Related Subject Headings

4611 Machine learning
1702 Cognitive Sciences
1701 Psychology

Citation

APA

Chicago

ICMJE

MLA

NLM

Wu, Y., Zhang, W., Xu, P., & Gu, Q. (2020). A finite-time analysis of two time-scale actor-critic methods. In Advances in Neural Information Processing Systems (Vol. 2020-December).

Published In

Advances in Neural Information Processing Systems

ISSN

1049-5258

Publication Date

January 1, 2020

Volume

2020-December

Related Subject Headings

4611 Machine learning
1702 Cognitive Sciences
1701 Psychology