增强协作多智能体强化学习中的全局信用分配机制

姚兴虎1; 宋光鑫2

增强协作多智能体强化学习中的全局信用分配机制

引用本文：姚兴虎1 , 宋光鑫2.增强协作多智能体强化学习中的全局信用分配机制[J].计算技术与自动化,2021,(1):149-154

摘要点击次数: 762

全文下载次数: 0

作者	单位
姚兴虎1 , 宋光鑫2	（1.南京航空航天大学计算机科学与技术学院，江苏南京 211106 2.南京航空航天大学理学院,江苏南京 211106）

中文摘要:针对协作多智能体强化学习中的全局信用分配机制很难捕捉智能体之间的复杂协作关系及无法有效地处理非马尔可夫奖励信号的问题，提出了一种增强的协作多智能体强化学习中的全局信用分配机制。首先，设计了一种新的基于奖励高速路连接的全局信用分配结构，使得智能体在决策时能够考虑其所分得的局部奖励信号与团队的全局奖励信号；其次，通过融合多步奖励信号提出了一种能够适应非马尔可夫奖励的值函数估计方法。在星际争霸微操作实验平台上的多个复杂场景下的实验结果表明：所提方法不仅能够取得先进的性能，同时还能大大提高样本的利用率。

中文关键词:深度学习强化学习多智能体系统

Enhancing Global Credit Assignment Mechanism for Cooperative Multi-Agent Reinforcement Learning

Abstract:In order to solve the problem that the global credit assignment mechanism in cooperative multi-agent reinforcement learning is difficult to capture the complex cooperative relationship among agents and cannot effectively deal with non-Markov reward signals, an enhanced global credit assignment mechanism in cooperative multi-agent reinforcement learning is proposed. Firstly, a new global credit assignment structure based on reward highway connection is designed, which enables each agent to consider the local reward signal and the team's global reward signal when making decisions. Secondly, by integrating multi-step rewards, a new value function estimation method which can adapt to non-Markov rewards is proposed. The experimental results of several complex scenarios on the StarCraft multi-agent challenges show that the proposed method can not only achieve state-of-the-art performance, but also greatly improve the sample efficiency.

keywords:deep learning reinforcement learning multi-agent systems

查看全文 查看/发表评论 下载pdf阅读器