講演情報

[3Yin-A-56]Scaling MARL-CPC: Achieving Decentralized Coordination in Multi-Agent Environments

〇Pyii Phyo Maung1, Naoto Yoshida1, Tadahiro Taniguchi1,2 (1. Kyoto University, 2. Ritsumeikan University)

キーワード:

Multi-Agent Reinforcement Learning、Collective Predictive Coding、Emergent Communication

In this work, we extend the MARL-CPC (Multi-Agent Reinforcement Learning with Collective Predictive Coding) framework to improve scalability with respect to the number of agents by introducing a multi-round message passing architecture. Previous studies demonstrated that MARL-CPC enables decentralized agents to learn meaningful communication through reward-independent variational inference; however, the scalability of the algorithm was not systematically investigated. We evaluate the scalability of the original MARL-CPC framework and identify its limitations. To address these limitations, we propose a multi-round message passing architecture in which agents exchange messages over multiple rounds. We introduce two variants: Final Round, in which the CPC loss is computed only on the final messages, and Every Round, in which the CPC loss is accumulated at each round. Using a bandit coordination task, we empirically demonstrate the effectiveness of the Every Round method.