登录 EN

添加临时用户

基于强化学习的多AUV协同水下目标围捕理论与算法

DRL--based Theory and Algorithms for Multi--AUV Cooperative Target Hunting in Underwater Environment

作者:魏维
  • 学号
    2020******
  • 学位
    硕士
  • 电子邮箱
    wei******.cn
  • 答辩日期
    2023.05.15
  • 导师
    董宇涵
  • 学科名
    电子信息
  • 页码
    80
  • 保密级别
    公开
  • 培养单位
    599 国际研究生院
  • 中文关键词
    集群智能, 空海跨域协同, 水下目标围捕, 强化学习, 微分博弈
  • 英文关键词
    Swarm intelligence, sea--air cross--tier cooperation, underwater target hunting, deep reinforcement learning, differential game

摘要

随着国家海洋战略的推进,利用集群团队的智能性构建一套实时、准确、有效的水下移动目标追踪与围捕网络,可以增强我国的海洋感知与防御能力,切实维护我国的海洋权益。然而迄今为止,水下移动目标追踪与围捕算法存在定位精度较低、水下动力学环境复杂、水下信道环境复杂、水声通信传播延迟较大、目标智能性较高的关键瓶颈。为了进一步提高自主潜航器(Autonomous Underwater Vehicle,AUV)对海洋随机出现目标的感知与防御能力,迫切需要设计一套高效的AUV集群水下目标围捕系统。 首先,论文介绍了群体智能研究背景与研究现状。然后探讨了空海环境下跨域异构无人集群系统。接着,重点介绍了围捕任务的概念和阶段划分。并且调研了三类常用的多 AUV 协同目标围捕理论与方法。 其次,论文定义了目标围捕成功率最大化、空海跨域协同系统连通性最大化、围捕能耗最小化问题,提出一个无人机(Unmanned Aerial Vehicle,UAV)、无人水面艇(Unmanned Surface Vehicle,USV)、AUV 协同路径规划与资源配置系统。仿真结果表明,改进的深度Q网络(Deep ?­ Network,DQN)算法及其变体在进行协同目标围捕任务时比传统算法获得了更高的能效,可以保证UAV--USV--AUV 异构系统之间连通性,同时该方案对水下100 m范围内的目标的围捕成功率大于95%。 再次,论文基于微分博弈理论并结合水下环境AUV的动力学与通信特点以及目标围捕任务对于编队一致性和避障的需求,建立微分博弈模型分析多AUV集群团队与单目标之间的对抗行为,通过求解哈密顿方程得到反馈控制策略。接着,论文进一步验证了集群系统的李雅普诺夫稳定性,同时从数学上证明水下目标围捕系统处于纳什均衡状态。仿真结果初步证明微分博弈模型实现了对目标围捕系统的统筹收益的综合优化。 最后,论文提出了具有实时奖励反馈、基于多智能体强化学习的水下协作目标围捕方法。基于端到端的多智能体强化学习实现了高效稳定的目标围捕策略,使集群系统在最大化围捕成功率与协作能力的同时,满足对抗系统稳定性的需求。进一步地,仿真结果验证了微分博弈模型的可行性和有效性,证明了所设计基于多智能体强化学习的多 AUV水下协作目标围捕方法的稳定性和在最优围捕策略下的纳什均衡状态,分析了水声通信延迟和海流、风等对AUV行进过程中的影响。

The national marine policies call for the real-time, accurate and effective underwater target tracking and hunting swarm team to enhance the sensing and defense ability of the marine network, and thus protect rights and interests of the national marine territory. However, the underwater target tracking and hunting system still suffers from low precision of positioning, the complexity of underwater dynamics, the harsh communication condition, the underwater communication propagation delay and the high intelligence of the target. In order to enhance the sensing and defense ability of Autonomous Underwater Vehicle (AUV) when the targets appear, the thesis aims to design an efficient AUV underwater target hunting system. Firstly, the thesis introduces the background and the state-of-art of swarm intelligence. Then, the sea-air cross--tier unmanned swarm systems are discussed. Moreover, it focuses on the concept and periods for the target hunting task, and three main theories for multi-AUV cooperative target hunting are investigated. Secondly, the thesis defines a target hunting problem by jointly maximizing the success rate of target hunting, maximizing the inter-connectivity of the sea-air cross--tier unmanned swarm system, and minimizing the energy consumption. Then, composed of Unmanned Aerial Vehicle (UAV) and Unmanned Surface Vehicle (USV), a joint UAV-USV-AUV system is proposed cooperatively for path planning and resource allocation. Simulation results show that the modified Deep Q-Network (DQN) and its variants achieve higher energy efficiency than the traditional algorithm and can ensure the connectivity of UAV-USV-AUV system when conducting cooperative target hunting task. Meanwhile, the success rate of the proposed target hunting scheme is more than 95% when the target appears within 100 m away from the swarm center. Thirdly, the thesis establishes a differential game model to analyze the adversarial behaviors between AUVs and a single target based on the differential game theory, the dynamics of AUVs, underwater communication characteristics, and the need of consistency and obstacle avoidance for the target hunting task. Moreover, it obtains the feedback control strategy by solving the Hamiltonian equation. Furthermore, the thesis further verifies the Lyapunov stability of the swarm hunting team, and proves mathematically that the underwater target hunting system satisfies Nash equilibrium. Simulation results preliminarily prove that the differential game model optimizes the pay-off functions of the proposed underwater target hunting system. Finally, the thesis proposes an underwater cooperative target hunting method based on multi-agent reinforcement learning to gain real-time reward. The end-to-end multi-agent reinforcement learning achieves an efficient and stable target hunting strategy. Specifically, it maximizes the success rate of the target hunting task and the cooperation ability of the swarm team, and meets the stability for the underwater target hunting system. Furthermore, simulation results verify the feasibility and effectiveness of the proposed differential game model. It proves the stability of the multi-AUV underwater cooperative target hunting system and the Nash equilibrium state relying on optimal strategies. Moreover, simulations analyze the effects of underwater acoustic communication delay, ocean current and wind when AUVs conduct hunting task.