TY - GEN
T1 - Optimizing Multi-Agent Coordination via Hierarchical Graph Probabilistic Recursive Reasoning
AU - Cohen, Saar
AU - Agmon, Noa
N1 - Publisher Copyright: © 2022 International Foundation for Autonomous Agents and Multiagent Systems (www.ifaamas.org). All rights reserved
PY - 2022
Y1 - 2022
N2 - Multi-agent reinforcement learning (MARL) requires coordination by some means of interaction between agents to efficiently solve tasks. Interaction graphs allow reasoning about joint actions based on the local structure of interactions, but they disregard the potential impact of an agent's action on its neighbors' behaviors, which could rapidly alter in dynamic settings. In this paper, we thus present a novel perspective on opponent modeling in domains with only local interactions using (level-1) Graph Probabilistic Recursive Reasoning (GrPR2). Unlike previous work on recursive reasoning, each agent iteratively best-responds to other agents' policies over all possible local interactions. Agents' policies are approximated via a variational Bayes scheme for capturing their uncertainties, and we prove that an induced variant of Q-learning converges under self-play when there exists only one Nash equilibrium. In cooperative settings, we further devise a variational lower bound on the likelihood of each agent's optimality. Opposed to other models, optimizing the resulting objective prevents each agent from attaining an unrealistic modelling of others, and yields an exact tabular Q-iteration method that holds convergence guarantees. Then, we deepen the recursion to level-k via Cognitive Hierarchy GrPR2 (GrPR2-CH), which lets each level-k player best-respond to a mixture of strictly lower levels in the hierarchy. We prove that: (1) level-3 reasoning is the optimal hierarchical level, maximizing each agent's expected return; and (2) the weak spot of the classical CH models is that 0-level is uniformly distributed, as it may introduce policy bias. Finally, we propose a practical actor-critic scheme, and illustrate that GrPR2-CH outperforms strong MARL baselines in the particle environment.
AB - Multi-agent reinforcement learning (MARL) requires coordination by some means of interaction between agents to efficiently solve tasks. Interaction graphs allow reasoning about joint actions based on the local structure of interactions, but they disregard the potential impact of an agent's action on its neighbors' behaviors, which could rapidly alter in dynamic settings. In this paper, we thus present a novel perspective on opponent modeling in domains with only local interactions using (level-1) Graph Probabilistic Recursive Reasoning (GrPR2). Unlike previous work on recursive reasoning, each agent iteratively best-responds to other agents' policies over all possible local interactions. Agents' policies are approximated via a variational Bayes scheme for capturing their uncertainties, and we prove that an induced variant of Q-learning converges under self-play when there exists only one Nash equilibrium. In cooperative settings, we further devise a variational lower bound on the likelihood of each agent's optimality. Opposed to other models, optimizing the resulting objective prevents each agent from attaining an unrealistic modelling of others, and yields an exact tabular Q-iteration method that holds convergence guarantees. Then, we deepen the recursion to level-k via Cognitive Hierarchy GrPR2 (GrPR2-CH), which lets each level-k player best-respond to a mixture of strictly lower levels in the hierarchy. We prove that: (1) level-3 reasoning is the optimal hierarchical level, maximizing each agent's expected return; and (2) the weak spot of the classical CH models is that 0-level is uniformly distributed, as it may introduce policy bias. Finally, we propose a practical actor-critic scheme, and illustrate that GrPR2-CH outperforms strong MARL baselines in the particle environment.
KW - Cognitive Hierarchy
KW - Interaction Graphs
KW - Multi-Agent Coordination
KW - Multi-Agent Reinforcement Learning
KW - Variational Inference
UR - http://www.scopus.com/inward/record.url?scp=85134302741&partnerID=8YFLogxK
M3 - منشور من مؤتمر
T3 - Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS
SP - 290
EP - 299
BT - International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2022
T2 - 21st International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2022
Y2 - 9 May 2022 through 13 May 2022
ER -