Off-policy reinforcement learning with Gaussian processes

Journal Article (Journal Article)

An off-policy Bayesian nonparameteric approximate reinforcement learning framework, termed as GPQ, that employs a Gaussian processes (GP) model of the value (Q) function is presented in both the batch and online settings. Sufficient conditions on GP hyperparameter selection are established to guarantee convergence of off-policy GPQ in the batch setting, and theoretical and practical extensions are provided for the online case. Empirical results demonstrate GPQ has competitive learning speed in addition to its convergence guarantees and its ability to automatically choose its own bases locations.

Full Text

Duke Authors

Cited Authors

  • Chowdhary, G; Liu, M; Grande, R; Walsh, T; How, J; Carin, L

Published Date

  • July 1, 2014

Published In

Volume / Issue

  • 1 / 3

Start / End Page

  • 227 - 238

Electronic International Standard Serial Number (EISSN)

  • 2329-9274

International Standard Serial Number (ISSN)

  • 2329-9266

Digital Object Identifier (DOI)

  • 10.1109/JAS.2014.7004680

Citation Source

  • Scopus