TY - JOUR
T1 - Value Functions Factorization With Latent State Information Sharing in Decentralized Multi-Agent Policy Gradients
AU - Zhou, Hanhan
AU - Lan, Tian
AU - Aggarwal, Vaneet
N1 - KAUST Repository Item: Exported on 2023-09-06
PY - 2023/7/17
Y1 - 2023/7/17
N2 - The use of centralized training and decentralized execution for value function factorization demonstrates the potential for addressing cooperative multi-agent reinforcement tasks. QMIX, one of the methods in this field, has emerged as the leading approach and showed superior performance on the StarCraft II micromanagement benchmark. Nonetheless, its monotonic mixing method of combining per-agent estimates in QMIX has limitations in representing joint action Q-values and may not provide enough global state information for accurately estimating single-agent value function, which can lead to suboptimal results. To this end, we present LSF-SAC, a novel framework that features a variational inference-based information-sharing mechanism as extra state information to assist individual agents in the value function factorization. We demonstrate that such latent individual state information sharing can significantly expand the power of value function factorization, while fully decentralized execution can still be maintained in LSF-SAC through a soft-actor-critic design. We evaluate LSF-SAC on the StarCraft II micromanagement challenge and demonstrate that it outperforms several state-of-the-art methods in challenging collaborative tasks. We further set extensive ablation studies for locating the key factors accounting for its performance improvements. We believe that this new insight can lead to new local value estimation methods and variational deep learning algorithms.
AB - The use of centralized training and decentralized execution for value function factorization demonstrates the potential for addressing cooperative multi-agent reinforcement tasks. QMIX, one of the methods in this field, has emerged as the leading approach and showed superior performance on the StarCraft II micromanagement benchmark. Nonetheless, its monotonic mixing method of combining per-agent estimates in QMIX has limitations in representing joint action Q-values and may not provide enough global state information for accurately estimating single-agent value function, which can lead to suboptimal results. To this end, we present LSF-SAC, a novel framework that features a variational inference-based information-sharing mechanism as extra state information to assist individual agents in the value function factorization. We demonstrate that such latent individual state information sharing can significantly expand the power of value function factorization, while fully decentralized execution can still be maintained in LSF-SAC through a soft-actor-critic design. We evaluate LSF-SAC on the StarCraft II micromanagement challenge and demonstrate that it outperforms several state-of-the-art methods in challenging collaborative tasks. We further set extensive ablation studies for locating the key factors accounting for its performance improvements. We believe that this new insight can lead to new local value estimation methods and variational deep learning algorithms.
UR - http://hdl.handle.net/10754/694120
UR - https://ieeexplore.ieee.org/document/10185094/
UR - http://www.scopus.com/inward/record.url?scp=85165306848&partnerID=8YFLogxK
U2 - 10.1109/TETCI.2023.3293193
DO - 10.1109/TETCI.2023.3293193
M3 - Article
SN - 2471-285X
SP - 1
EP - 11
JO - IEEE Transactions on Emerging Topics in Computational Intelligence
JF - IEEE Transactions on Emerging Topics in Computational Intelligence
ER -