Skip to main content

Safe Cooperative Multi-Agent Reinforcement Learning with Function Approximation

Publication ,  Conference
Hsu, HL; Pajic, M
Published in: Proceedings of Machine Learning Research
January 1, 2025

Cooperative multi-agent reinforcement learning (MARL) has demonstrated significant promise in dynamic control environments, where effective communication and tailored exploration strategies facilitate collaboration. However, ensuring safe exploration remains challenging, as even a single unsafe action from any agent may result in catastrophic consequences. To mitigate this risk, we introduce Scoop-LSVI, a UCB-based cooperative parallel RL framework that achieves low cumulative regret under minimal communication overhead while adhering to safety constraints. Scoop-LSVI enables multiple agents to solve isolated Markov Decision Processes (MDPs) concurrently and share information to enhance collective learning efficiency. We establish a regret bound of Õ(κd3/2H2 √MK), where d is the feature dimension, H is the horizon length, M is the number of agents, K is the number of episodes per agent, and κ quantifies safety constraints. Our result aligns with state-of-the-art findings for unsafe cooperative MARL and matches the regret bound of UCB-based safe single-agent RL algorithms when M = 1, highlighting the potential of Scoop-LSVI to support safe and efficient learning in cooperative MARL applications.

Duke Scholars

Published In

Proceedings of Machine Learning Research

EISSN

2640-3498

Publication Date

January 1, 2025

Volume

283

Start / End Page

1353 / 1364
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Hsu, H. L., & Pajic, M. (2025). Safe Cooperative Multi-Agent Reinforcement Learning with Function Approximation. In Proceedings of Machine Learning Research (Vol. 283, pp. 1353–1364).
Hsu, H. L., and M. Pajic. “Safe Cooperative Multi-Agent Reinforcement Learning with Function Approximation.” In Proceedings of Machine Learning Research, 283:1353–64, 2025.
Hsu HL, Pajic M. Safe Cooperative Multi-Agent Reinforcement Learning with Function Approximation. In: Proceedings of Machine Learning Research. 2025. p. 1353–64.
Hsu, H. L., and M. Pajic. “Safe Cooperative Multi-Agent Reinforcement Learning with Function Approximation.” Proceedings of Machine Learning Research, vol. 283, 2025, pp. 1353–64.
Hsu HL, Pajic M. Safe Cooperative Multi-Agent Reinforcement Learning with Function Approximation. Proceedings of Machine Learning Research. 2025. p. 1353–1364.

Published In

Proceedings of Machine Learning Research

EISSN

2640-3498

Publication Date

January 1, 2025

Volume

283

Start / End Page

1353 / 1364