Skip to main content

Global Convergence of Localized Policy Iteration in Networked Multi-Agent Reinforcement Learning

Publication ,  Journal Article
Zhang, Y; Qu, G; Xu, P; Lin, Y; Chen, Z; Wierman, A
Published in: Performance Evaluation Review
June 19, 2023

We study a multi-agent reinforcement learning (MARL) problem where the agents interact over a given network. The goal of the agents is to cooperatively maximize the average of their entropy-regularized long-term rewards. To overcome the curse of dimensionality and to reduce communication, we propose a Localized Policy Iteration (LPI) algorithm that provably learns a near-globally-optimal policy using only local information. In particular, we show that, despite restricting each agent's attention to only its κ-hop neighborhood, the agents are able to learn a policy with an optimality gap that decays polynomially in κ. In addition, we show the finite-sample convergence of LPI to the global optimal policy, which explicitly captures the trade-off between optimality and computational complexity in choosing κ. Numerical simulations demonstrate the effectiveness of LPI. This extended abstract is an abridged version of [12].

Duke Scholars

Published In

Performance Evaluation Review

DOI

ISSN

0163-5999

Publication Date

June 19, 2023

Volume

51

Issue

1

Start / End Page

83 / 84

Related Subject Headings

  • Networking & Telecommunications
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Zhang, Y., Qu, G., Xu, P., Lin, Y., Chen, Z., & Wierman, A. (2023). Global Convergence of Localized Policy Iteration in Networked Multi-Agent Reinforcement Learning. Performance Evaluation Review, 51(1), 83–84. https://doi.org/10.1145/3606376.3593545
Zhang, Y., G. Qu, P. Xu, Y. Lin, Z. Chen, and A. Wierman. “Global Convergence of Localized Policy Iteration in Networked Multi-Agent Reinforcement Learning.” Performance Evaluation Review 51, no. 1 (June 19, 2023): 83–84. https://doi.org/10.1145/3606376.3593545.
Zhang Y, Qu G, Xu P, Lin Y, Chen Z, Wierman A. Global Convergence of Localized Policy Iteration in Networked Multi-Agent Reinforcement Learning. Performance Evaluation Review. 2023 Jun 19;51(1):83–4.
Zhang, Y., et al. “Global Convergence of Localized Policy Iteration in Networked Multi-Agent Reinforcement Learning.” Performance Evaluation Review, vol. 51, no. 1, June 2023, pp. 83–84. Scopus, doi:10.1145/3606376.3593545.
Zhang Y, Qu G, Xu P, Lin Y, Chen Z, Wierman A. Global Convergence of Localized Policy Iteration in Networked Multi-Agent Reinforcement Learning. Performance Evaluation Review. 2023 Jun 19;51(1):83–84.

Published In

Performance Evaluation Review

DOI

ISSN

0163-5999

Publication Date

June 19, 2023

Volume

51

Issue

1

Start / End Page

83 / 84

Related Subject Headings

  • Networking & Telecommunications