增强协作多智能体强化学习中的全局信用分配机制
    点此下载全文
引用本文:姚兴虎1 , 宋光鑫2.增强协作多智能体强化学习中的全局信用分配机制[J].计算技术与自动化,2021,(1):149-154
摘要点击次数: 865
全文下载次数: 0
作者单位
姚兴虎1 , 宋光鑫2 (1.南京航空航天大学 计算机科学与技术学院江苏 南京 211106
2.南京航空航天大学 理学院,江苏 南京 211106) 
中文摘要:针对协作多智能体强化学习中的全局信用分配机制很难捕捉智能体之间的复杂协作关系及无法有效地处理非马尔可夫奖励信号的问题,提出了一种增强的协作多智能体强化学习中的全局信用分配机制。首先,设计了一种新的基于奖励高速路连接的全局信用分配结构,使得智能体在决策时能够考虑其所分得的局部奖励信号与团队的全局奖励信号;其次,通过融合多步奖励信号提出了一种能够适应非马尔可夫奖励的值函数估计方法。在星际争霸微操作实验平台上的多个复杂场景下的实验结果表明:所提方法不仅能够取得先进的性能,同时还能大大提高样本的利用率。
中文关键词:深度学习  强化学习  多智能体系统
 
Enhancing Global Credit Assignment Mechanism for Cooperative Multi-Agent Reinforcement Learning
Abstract:In order to solve the problem that the global credit assignment mechanism in cooperative multi-agent reinforcement learning is difficult to capture the complex cooperative relationship among agents and cannot effectively deal with non-Markov reward signals, an enhanced global credit assignment mechanism in cooperative multi-agent reinforcement learning is proposed. Firstly, a new global credit assignment structure based on reward highway connection is designed, which enables each agent to consider the local reward signal and the team's global reward signal when making decisions. Secondly, by integrating multi-step rewards, a new value function estimation method which can adapt to non-Markov rewards is proposed. The experimental results of several complex scenarios on the StarCraft multi-agent challenges show that the proposed method can not only achieve state-of-the-art performance, but also greatly improve the sample efficiency.
keywords:deep learning  reinforcement learning  multi-agent systems
查看全文   查看/发表评论   下载pdf阅读器