登录 EN

添加临时用户

复杂环境下多任务强化学习研究

Research on Multi-task Reinforcement Learning in Complex Environment

作者:徐家卫
  • 学号
    2020******
  • 学位
    硕士
  • 电子邮箱
    xuj******.cn
  • 答辩日期
    2023.05.14
  • 导师
    袁春
  • 学科名
    电子信息
  • 页码
    73
  • 保密级别
    公开
  • 培养单位
    599 国际研究生院
  • 中文关键词
    强化学习,多任务学习,渐进式神经网络,知识蒸馏
  • 英文关键词
    Reinforcement Learning, Multi-task Learning, Progressive Neural Network, Knowledge Distillation

摘要

强化学习技术由智能体和环境两部分构成,智能体通过与环境不断交互得到数据和学习策略,以期望在环境中获得最大的奖励值。目前的强化学习算法主要聚焦于在单任务上提升智能体的性能,智能体很难从零开始学习多个任务。因此本文将多任务学习的思想和方法与强化学习相融合,设计多任务强化学习训练框架,使智能体可以学习多种任务。多任务学习也是机器学习中的分支之一,该技术赋予模型学习多个任务的能力。本文受多任务学习中渐进式神经网络和知识蒸馏方法的启发,分别在单环境和多环境下设计多任务强化学习训练框架。在单环境中,智能体要在一个环境中完成最终目标,但是智能体需要掌握多种技能才能完成最终目标。本文选择的是ViZDoom环境,该环境是第一人称射击的多智能体对抗环境。在ViZDoom中,智能体若想获得最高的分数,需要掌握巡航,射击,预判和记忆等多种技能。目前的研究工作主要依赖于人类先验知识或者复杂的神经网络,同时训练智能体的多种技能,导致训练出的智能体策略单一且固定。因此本文将掌握多种技能视为完成多个任务,建立多任务学习框架,让智能体掌握多种技能。多任务学习中的渐进式神经网络是让模型每次只需要学习一个任务,但是每次学习新任务时会使用之前学习到的知识。受此启发,本文设计基于渐进式学习的多任务训练框架,分阶段训练智能体的不同任务,以此训练出可以自适应调整策略的智能体。实验表明,在与往届ViZDoom竞赛的冠军AI进行对抗时,本文训练的智能体可以获得最高的分数。在多环境中,智能体需要和不同的环境同时进行交互,完成每个环境中的任务。拥有学习多任务的能力对提升智能体泛化性非常重要,因此本文选择在Atari和Mujoco系列环境下,设计多任务学习框架。多任务学习中知识蒸馏方法,是将两个或多个模型通过某种方式进行融合,使融合后的模型可以胜任多个任务。受此启发,本文设计基于教师指导的多任务训练框架,使智能体在不同环境中学习多任务。通过每个任务上的教师智能体共同指导一个学生智能体,使其可以胜任每个环境中的对应任务。与其它只在策略更新上进行指导的方法不同,本文中还在学生智能体收集数据时给出动作建议。在学习完多任务后,本文设计神经网络部分初始化方式,使学生智能体可以有效利用多任务知识,从而可以快速适应新的任务。在Atari和Mujoco环境上,本文设计多任务训练框架取得了较好的效果。

Reinforcement learning consists of two parts, the agent and the environment. The agent collects data and learns policy through interaction with the environment to obtain the maximum return in the environment. Current reinforcement learning algorithms mainly focus on improving agents‘ performance on a single task, and it is difficult for agents to learn multiple tasks from scratch. Therefore, this work integrates the ideas and methods of multi-task learning with reinforcement learning and designs a multi-task reinforcement learning training framework so that agents can master multiple skills. Multi-task learning is also one of the branches of machine learning, which gives the model the ability to learn multiple tasks. This work is inspired by the progressive neural network and knowledge distillation methods in multi-task learning and designs multi-task reinforcement learning training frameworks in single-environment and multi-environment.In the single-environment, the agent needs to complete the final goal in one environment, but the agent must master multiple skills to accomplish the final goal. This work chooses the ViZDoom environment, a multi-agent confrontation environment for first-person shooting. In this environment, if the agent wants to get the highest score, it is necessary to master various skills such as cruising, shooting, anticipation, and memory. Current work mainly relies on prior human knowledge or complex neural network and trains multiple skills for the agent at the same time, resulting in a single and fixed policy. Therefore, this work regards multiple mastering skills as completing multiple tasks and establishes a multi-task learning framework to allow agents to master multiple skills. The progressive neural network in multi-task learning lets the model learn one task at a time, but the previously learned knowledge is used each time a new task is learned. Inspired by this, this work designs a multi-task training framework based on progressive learning to train agents for different tasks in different stages to train agents that can adaptively adjust policy. Experiments show that the agent trained in this work can obtain the highest score when fighting against the champion AI of previous ViZDoom competitions.In the multi-environment, the agent needs to interact with different environments and complete each environment‘s tasks simultaneously. Having the ability to learn multi-tasks is very important to improve the generalization of agents, so this work designs a multi-task learning framework in the Atari and Mujoco environments. The knowledge distillation method in multi-task learning is to fuse two or more models so that the fused model can master multiple tasks. Inspired by this, this work designs a multi-task training framework based on teacher guidance, which makes the agent learn multi-task in different environments. A teacher agent on each task jointly guides a student agent so that the student agent can perform the corresponding tasks in each environment. Unlike other methods that only guide policy updates, this work provides action suggestions for the student agent while collecting data. Furthermore, after learning multi-task, this work designs a partial initialization method on the neural network, which enables student policy to be quickly adapted to a new task. The multi-task training framework designed in this work has achieved good results in Atari and Mujoco environments.