Skip to main content

TD γ : Re-evaluating complex backups in temporal difference learning

Publication ,  Journal Article
Konidaris, G; Niekum, S; Thomas, PS
Published in: Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011, NIPS 2011
2011

We show that the λ-return target used in the TD(λ) family of algorithms is the maximum likelihood estimator for a specific model of how the variance of an n-step return estimate increases with n. We introduce the γ-return estimator, an alternative target based on a more accurate model of variance, which defines the TD γ family of complex-backup temporal difference learning algorithms. We derive TD γ the -return equivalent of the original TD(λ) algorithm, which eliminates the λ parameter but can only perform updates at the end of an episode and requires time and space proportional to the episode length. We then derive a second algorithm, TD γ(C), with a capacity parameter C. TD γ(C) requires C times more time and memory than TD(λ) and is incremental and online. We show that TD γ outperforms TD(λ) for any setting of λ on 4 out of 5 benchmark domains, and that TD γ(C) performs as well as or better than TD γ for intermediate settings of C.

Duke Scholars

Published In

Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011, NIPS 2011

Publication Date

2011
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Konidaris, G., Niekum, S., & Thomas, P. S. (2011). TD γ : Re-evaluating complex backups in temporal difference learning. Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011, NIPS 2011.
Konidaris, G., S. Niekum, and P. S. Thomas. “TD γ : Re-evaluating complex backups in temporal difference learning.” Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011, NIPS 2011, 2011.
Konidaris G, Niekum S, Thomas PS. TD γ : Re-evaluating complex backups in temporal difference learning. Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011, NIPS 2011. 2011;
Konidaris, G., et al. “TD γ : Re-evaluating complex backups in temporal difference learning.” Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011, NIPS 2011, 2011.
Konidaris G, Niekum S, Thomas PS. TD γ : Re-evaluating complex backups in temporal difference learning. Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011, NIPS 2011. 2011;

Published In

Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011, NIPS 2011

Publication Date

2011