Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator

Conference Paper

Direct policy gradient methods for reinforcement learning and continuous control problems arc a popular approach for a variety of reasons: 1) they are easy to implement without explicit knowledge of the underlying model, 2) they are an "end- to-end" approach, directly optimizing the performance metric of interest, 3) they inherently allow for richly parameterized policies. A notable drawback is that even in the most basic continuous control problem (that of linear quadratic regulators), these methods must solve a non-convex optimization problem, where little is understood about their efficiency from both computational and statistical perspectives. In contrast, system identification and model based planning in opti- : Mal control theory have a much more solid theo- ! retical footing, where much is known with regards to their computational and statistical properties. , This work bridges this gap showing that (model ; free) policy gradient methods globally converge to the optimal solution and are efficient (polynomi- ' ally so in relevant problem dependent quantities) : With regards to their sample and computational complexities.

Duke Authors

Cited Authors

  • Fazel, M; Ge, R; Kakade, SM; Mesbahi, M

Published Date

  • January 1, 2018

Published In

  • 35th International Conference on Machine Learning, Icml 2018

Volume / Issue

  • 4 /

Start / End Page

  • 2385 - 2413

International Standard Book Number 13 (ISBN-13)

  • 9781510867963

Citation Source

  • Scopus