Scholars@Duke publication: Steering Decision Transformers via Temporal Difference Learning

Steering Decision Transformers via Temporal Difference Learning

Publication , Conference

Hsu, HL; Bozkurt, AK; Dong, J; Gao, Q; Tarokh, V; Pajic, M

Published in: IEEE International Conference on Intelligent Robots and Systems

January 1, 2024

Decision Transformers (DTs) have been highly effective for offline reinforcement learning (RL) tasks, successfully modeling the sequences of actions in a given set of demonstrations. However, DTs may perform poorly in stochastic environments, which are prevalent in robotics scenarios. In this paper, we identify that the root cause of this performance degradation is the growing variance of returns-to-go, the signal used by DTs to predict actions, accumulated over the horizon. Building upon this insight, we propose an extension to DTs that allows them to be steered toward high-reward regions, where the expected returns are estimated using temporal difference learning. This way, we not only mitigate the growing variance problem but also eliminate the need for DTs to have access to returns-to-go during evaluation and deployment phases. We show that our method outperforms state-of-the-art offline RL methods in both simulated and real-world robotic arm environments.

Duke Scholars

Author Hao-Lun Hsu

Author Vahid Tarokh Electrical and Computer Engineering

Author Miroslav Pajic Electrical and Computer Engineering

Published In

IEEE International Conference on Intelligent Robots and Systems

DOI

10.1109/IROS58592.2024.10801303

EISSN

2153-0866

ISSN

2153-0858

Publication Date

January 1, 2024

Start / End Page

7477 / 7483

Citation

APA

Chicago

ICMJE

MLA

NLM

Hsu, H. L., Bozkurt, A. K., Dong, J., Gao, Q., Tarokh, V., & Pajic, M. (2024). Steering Decision Transformers via Temporal Difference Learning. In IEEE International Conference on Intelligent Robots and Systems (pp. 7477–7483). https://doi.org/10.1109/IROS58592.2024.10801303

Hsu, H. L., A. K. Bozkurt, J. Dong, Q. Gao, V. Tarokh, and M. Pajic. “Steering Decision Transformers via Temporal Difference Learning.” In IEEE International Conference on Intelligent Robots and Systems, 7477–83, 2024. https://doi.org/10.1109/IROS58592.2024.10801303.

Hsu HL, Bozkurt AK, Dong J, Gao Q, Tarokh V, Pajic M. Steering Decision Transformers via Temporal Difference Learning. In: IEEE International Conference on Intelligent Robots and Systems. 2024. p. 7477–83.

Hsu, H. L., et al. “Steering Decision Transformers via Temporal Difference Learning.” IEEE International Conference on Intelligent Robots and Systems, 2024, pp. 7477–83. Scopus, doi:10.1109/IROS58592.2024.10801303.

Hsu HL, Bozkurt AK, Dong J, Gao Q, Tarokh V, Pajic M. Steering Decision Transformers via Temporal Difference Learning. IEEE International Conference on Intelligent Robots and Systems. 2024. p. 7477–7483.

Published In

IEEE International Conference on Intelligent Robots and Systems

DOI

10.1109/IROS58592.2024.10801303

EISSN

2153-0866

ISSN

2153-0858

Publication Date

January 1, 2024

Start / End Page

7477 / 7483