Policy evaluation using the Ω-return
Publication
, Conference
Thomas, PS; Niekum, S; Theocharous, G; Konidaris, G
Published in: Advances in Neural Information Processing Systems
January 1, 2015
We propose the-return as an alternative to the λ-return currently used by the TD(λ) family of algorithms. The benefit of the-return is that it accounts for the correlation of different length returns. Because it is difficult to compute exactly, we suggest one way of approximating the-return. We provide empirical studies that suggest that it is superior to the λ-return and-return for a variety of problems.
Duke Scholars
Published In
Advances in Neural Information Processing Systems
ISSN
1049-5258
Publication Date
January 1, 2015
Volume
2015-January
Start / End Page
334 / 342
Related Subject Headings
- 1702 Cognitive Sciences
- 1701 Psychology
Citation
APA
Chicago
ICMJE
MLA
NLM
Thomas, P. S., Niekum, S., Theocharous, G., & Konidaris, G. (2015). Policy evaluation using the Ω-return. In Advances in Neural Information Processing Systems (Vol. 2015-January, pp. 334–342).
Thomas, P. S., S. Niekum, G. Theocharous, and G. Konidaris. “Policy evaluation using the Ω-return.” In Advances in Neural Information Processing Systems, 2015-January:334–42, 2015.
Thomas PS, Niekum S, Theocharous G, Konidaris G. Policy evaluation using the Ω-return. In: Advances in Neural Information Processing Systems. 2015. p. 334–42.
Thomas, P. S., et al. “Policy evaluation using the Ω-return.” Advances in Neural Information Processing Systems, vol. 2015-January, 2015, pp. 334–42.
Thomas PS, Niekum S, Theocharous G, Konidaris G. Policy evaluation using the Ω-return. Advances in Neural Information Processing Systems. 2015. p. 334–342.
Published In
Advances in Neural Information Processing Systems
ISSN
1049-5258
Publication Date
January 1, 2015
Volume
2015-January
Start / End Page
334 / 342
Related Subject Headings
- 1702 Cognitive Sciences
- 1701 Psychology