Scholars@Duke publication: Action-Dependent Optimality-Preserving Reward Shaping

Action-Dependent Optimality-Preserving Reward Shaping

Publication , Conference

Forbes, GC; Wang, J; Villalobos-Arias, L; Jhala, A; Roberts, DL

Published in: Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems Aamas

January 1, 2025

Recent RL research has utilized reward shaping-particularly complex shaping rewards such as intrinsic motivation (IM)-to encourage agent exploration in sparse-reward environments. While often effective, “reward hacking” can lead to the shaping reward being optimized at the expense of the extrinsic reward. Prior techniques have mitigated this, allowing for implementing IM without altering optimal policies, but have only thus far been tested in simple environments. In this work we show that they are effectively unsuitable for complex, exploration-heavy environments with long episodes. To remedy this, we introduce Action-Dependent Optimality Preserving Shaping (ADOPS), a method of converting arbitrary intrinsic rewards to an optimality-preserving form that allows agents to utilize them more effectively in the extremely sparse environment of Montezuma's Revenge. We demonstrate significant improvement over prior SOTA optimality-preserving IM-conversion methods, and argue that these improvements come from ADOPS's ability to preserve 'action-dependent' IM terms.

Duke Scholars

Author Arnav Jhala Engineering Graduate and Professional Programs

Published In

Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems Aamas

EISSN

1558-2914

ISSN

1548-8403

Publication Date

January 1, 2025

Start / End Page

2523 / 2525

Citation

APA

Chicago

ICMJE

MLA

NLM

Forbes, G. C., Wang, J., Villalobos-Arias, L., Jhala, A., & Roberts, D. L. (2025). Action-Dependent Optimality-Preserving Reward Shaping. In Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems Aamas (pp. 2523–2525).

Forbes, G. C., J. Wang, L. Villalobos-Arias, A. Jhala, and D. L. Roberts. “Action-Dependent Optimality-Preserving Reward Shaping.” In Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems Aamas, 2523–25, 2025.

Forbes GC, Wang J, Villalobos-Arias L, Jhala A, Roberts DL. Action-Dependent Optimality-Preserving Reward Shaping. In: Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems Aamas. 2025. p. 2523–5.

Forbes, G. C., et al. “Action-Dependent Optimality-Preserving Reward Shaping.” Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems Aamas, 2025, pp. 2523–25.

Forbes GC, Wang J, Villalobos-Arias L, Jhala A, Roberts DL. Action-Dependent Optimality-Preserving Reward Shaping. Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems Aamas. 2025. p. 2523–2525.

Published In

Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems Aamas

EISSN

1558-2914

ISSN

1548-8403

Publication Date

January 1, 2025

Start / End Page

2523 / 2525