Value propagation for decentralized networked deep multi-agent reinforcement learning

Chao Qu, Shie Mannor, Huan Xu, Yuan Qi, Le Song, Junwu Xiong

Research output: Contribution to journalConference articlepeer-review


We consider the networked multi-agent reinforcement learning (MARL) problem in a fully decentralized setting, where agents learn to coordinate to achieve joint success. This problem is widely encountered in many areas including traffic control, distributed control, and smart grids. We assume each agent is located at a node of a communication network and can exchange information only with its neighbors. Using softmax temporal consistency, we derive a primal-dual decentralized optimization method and obtain a principled and data-efficient iterative algorithm named value propagation. We prove a non-asymptotic convergence rate of O(1/T) with nonlinear function approximation. To the best of our knowledge, it is the first MARL algorithm with a convergence guarantee in the control, off-policy, non-linear function approximation, fully decentralized setting.

Original languageEnglish
JournalAdvances in Neural Information Processing Systems
StatePublished - 2019
Event33rd Annual Conference on Neural Information Processing Systems, NeurIPS 2019 - Vancouver, Canada
Duration: 8 Dec 201914 Dec 2019

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Signal Processing
  • Computer Networks and Communications


Dive into the research topics of 'Value propagation for decentralized networked deep multi-agent reinforcement learning'. Together they form a unique fingerprint.

Cite this