登录 EN

添加临时用户

面向合作场景基于多样性的多智能 体强化学习研究

Research on Multi-Agent Reinforcement Learning Based on Diversity for Collaborative Scenarios

作者:陈阳昆
  • 学号
    2021******
  • 学位
    硕士
  • 电子邮箱
    118******com
  • 答辩日期
    2024.05.14
  • 导师
    李秀
  • 学科名
    电子信息
  • 页码
    87
  • 保密级别
    公开
  • 培养单位
    599 国际研究生院
  • 中文关键词
    多智能体强化学习;探索任务;样本利用;研究环境
  • 英文关键词
    Multi-Agent Reinforcement Learning; Exploration; Sample Reuse,Bench- mark

摘要

深度强化学习的出现赋予了智能体高级的决策能力。多智能体强化学习作为一种在多智能体系统中实现协作与竞争的方法,近年来在机器人控制、交通管理等领域展现出了巨大的潜力和广泛的应用前景。然而,现有的多智能体强化学习面临状态动作空间随着智能体数目增加而指数上升的问题,严重影响了探索效率进而影响状态空间多样性;样本利用率低且利用方式单一,进而影响训练多样性;最后是验证环境规模较小、设定的任务指标较为固定,且缺乏业界成熟的训练方案,进而影响策略评估效能。论文在此背景上开展了对多智能体强化学习算法研究及环境效果验证,主要工作包括:本文提出了一种基于条件熵的挑选关键子状态的多智能体探索算法,该算法通过信息熵计算联合状态空间中的关键子状态,并每隔固定时间进行更新,通过掩码操作后送入RND网络以计算内在奖励,鼓励多智能体进行联合探索,该方法可以兼容到目前主流的强化学习算法,并通过在一个模拟推箱环境和多智能体算法的基准测试环境SMAC上进行了测试,验证了算法的有效性。本文提出了一种基于个体差异进行样本重复学习的多智能体算法,通过样本重复利用来精确拟合值函数。该算法将Off-Policy算法网络进行拆分,针对每个样本当前状态进行不同程度的样本重复更新,可以充分利用个体身份信息,并显著提高了样本利用效率。在SMAC上的实验证明,在相同环境交互步数下,该算法可以取得更理想的效果。本文在Neural MMO环境中进行了大规模智能体合作对抗实验,并提出了一套在多智能体环境下训练有效智能体的强化学习训练方案。实验结果表明,作为相关竞赛的最高难度内置智能体,该方案取得了显著的性能效果。此外,通过比赛分析,本文验证了该环境作为研究多智能体强化学习算法的基座环境时,其能够有效验证相关算法的鲁棒性和泛化性,有利于后续学术界进行相关的大规模多智能体多任务合作场景的算法研究。

The advent of deep reinforcement learning has endowed agents with advanced decision-making capabilities. Multi-agent reinforcement learning (MARL), as a method for achieving cooperation and competition in multi-agent systems, has recently shown significant potential and broad application prospects in fields such as robotic control and traffic management. However, current MARL approaches face several challenges: the state-action space grows exponentially with the number of agents, severely affecting exploration efficiency and consequently state space diversity; sample utilization is low and approaches to utilization are singular, which impacts training diversity; and validation environments are often small in scale, with fixed task metrics and a lack of mature training schemes from the industry, thereby affecting policy evaluation efficacy. This paper conducts research on MARL algorithms and evaluates their effectiveness. The main contributions include:This thesis proposes a multi-agent exploration algorithm based on conditional entropy, which selects critical sub-states. The algorithm calculates the entropy of key sub-states in the joint state space, updates them at fixed intervals, and introduces them into the RND network after masking operations to compute intrinsic rewards. This encourages collaborative exploration among multiple agents. The method is compatible with mainstream reinforcement learning algorithms and has been tested in a simulated box-pushing environment and the SMAC benchmark testing environment for multi-agent algorithms, validating the effectiveness of the algorithm.This thesis introduces a multi-agent algorithm that utilizes sample repetition learning based on individual differences to accurately fit value functions. This algorithm splits the Off-Policy algorithm network and performs different degrees of sample repetition updates for each agent's current state, which can fully exploit individual identity information and significantly improve sample utilization efficiency. Experimental results on SMAC demonstrate that this algorithm achieves more desirable outcomes under the same environment interaction steps.This thesis conducts large-scale cooperative adversarial experiments with agents in the Neural MMO environment, presenting a reinforcement learning training scheme for training effective agents in similar environments. Experimental results indicate that this scheme, as an internally embedded agent in a relevant competition with the most difficult levels, achieves significant algorithmic effectiveness. Furthermore, through competition analysis, this thesis verifies that this environment, serving as a foundational environment for researching multi-agent reinforcement learning algorithms, effectively validates the robustness and generalization of relevant algorithms, which will facilitate subsequent large-scale multi-agent and multi-task cooperation scenario algorithm research in the academic community.