Skip to main content

Multi-objective Reinforcement Learning with Nonlinear Preferences: Provable Approximation for Maximizing Expected Scalarized Return

Publication ,  Conference
Peng, N; Tian, M; Fain, B
Published in: Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems Aamas
January 1, 2025

We study multi-objective reinforcement learning with nonlinear preferences over trajectories. That is, we maximize the expected value of a nonlinear function over accumulated rewards (expected scalarized return or ESR) in a multi-objective Markov Decision Process (MOMDP). We derive an extended form of Bellman optimality for nonlinear optimization that explicitly considers time and current accumulated reward. Using this formulation, we describe an approximation algorithm for computing an approximately optimal non-stationary policy in pseudopolynomial time for smooth scalarization functions with a constant number of rewards. We prove the approximation analytically and demonstrate the algorithm experimentally, showing that there can be a substantial gap between the optimal policy computed by our algorithm and alternative baselines.

Duke Scholars

Published In

Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems Aamas

EISSN

1558-2914

ISSN

1548-8403

Publication Date

January 1, 2025

Start / End Page

1632 / 1640
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Peng, N., Tian, M., & Fain, B. (2025). Multi-objective Reinforcement Learning with Nonlinear Preferences: Provable Approximation for Maximizing Expected Scalarized Return. In Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems Aamas (pp. 1632–1640).
Peng, N., M. Tian, and B. Fain. “Multi-objective Reinforcement Learning with Nonlinear Preferences: Provable Approximation for Maximizing Expected Scalarized Return.” In Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems Aamas, 1632–40, 2025.
Peng N, Tian M, Fain B. Multi-objective Reinforcement Learning with Nonlinear Preferences: Provable Approximation for Maximizing Expected Scalarized Return. In: Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems Aamas. 2025. p. 1632–40.
Peng, N., et al. “Multi-objective Reinforcement Learning with Nonlinear Preferences: Provable Approximation for Maximizing Expected Scalarized Return.” Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems Aamas, 2025, pp. 1632–40.
Peng N, Tian M, Fain B. Multi-objective Reinforcement Learning with Nonlinear Preferences: Provable Approximation for Maximizing Expected Scalarized Return. Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems Aamas. 2025. p. 1632–1640.

Published In

Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems Aamas

EISSN

1558-2914

ISSN

1548-8403

Publication Date

January 1, 2025

Start / End Page

1632 / 1640