Reinforcement learning via kernel temporal difference.


Conference Paper

This paper introduces a kernel adaptive filter implemented with stochastic gradient on temporal differences, kernel Temporal Difference (TD)(λ), to estimate the state-action value function in reinforcement learning. The case λ=0 will be studied in this paper. Experimental results show the method's applicability for learning motor state decoding during a center-out reaching task performed by a monkey. The results are compared to the implementation of a time delay neural network (TDNN) trained with backpropagation of the temporal difference error. From the experiments, it is observed that kernel TD(0) allows faster convergence and a better solution than the neural network.

Full Text

Duke Authors

Cited Authors

  • Bae, J; Chhatbar, P; Francis, JT; Sanchez, JC; Principe, JC

Published Date

  • January 2011

Published In

Volume / Issue

  • 2011 /

Start / End Page

  • 5662 - 5665

PubMed ID

  • 22255624

Pubmed Central ID

  • 22255624

International Standard Serial Number (ISSN)

  • 1557-170X

Digital Object Identifier (DOI)

  • 10.1109/iembs.2011.6091370