Skip to main content

A finite-time analysis of two time-scale actor-critic methods

Publication ,  Conference
Wu, Y; Zhang, W; Xu, P; Gu, Q
Published in: Advances in Neural Information Processing Systems
January 1, 2020

Actor-critic (AC) methods have exhibited great empirical success compared with other reinforcement learning algorithms, where the actor uses the policy gradient to improve the learning policy and the critic uses temporal difference learning to estimate the policy gradient. Under the two time-scale learning rate schedule, the asymptotic convergence of AC has been well studied in the literature. However, the non-asymptotic convergence and finite sample complexity of actor-critic methods are largely open. In this work, we provide a non-asymptotic analysis for two timescale actor-critic methods under non-i.i.d. setting. We prove that the actor-critic method is guaranteed to find a first-order stationary point (i.e., k?J(?)k22 = e) of the non-concave performance function J(?), with Oe(e-2.5) sample complexity. To the best of our knowledge, this is the first work providing finite-time analysis and sample complexity bound for two time-scale actor-critic methods.

Duke Scholars

Published In

Advances in Neural Information Processing Systems

ISSN

1049-5258

Publication Date

January 1, 2020

Volume

2020-December

Related Subject Headings

  • 4611 Machine learning
  • 1702 Cognitive Sciences
  • 1701 Psychology
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Wu, Y., Zhang, W., Xu, P., & Gu, Q. (2020). A finite-time analysis of two time-scale actor-critic methods. In Advances in Neural Information Processing Systems (Vol. 2020-December).

Published In

Advances in Neural Information Processing Systems

ISSN

1049-5258

Publication Date

January 1, 2020

Volume

2020-December

Related Subject Headings

  • 4611 Machine learning
  • 1702 Cognitive Sciences
  • 1701 Psychology