登录 EN

添加临时用户

多对多协同区域攻防博弈

Multi-Agent Cooperative Reach-Avoid Games

作者:闫芮
  • 学号
    2015******
  • 学位
    博士
  • 电子邮箱
    yr1******.cn
  • 答辩日期
    2020.12.07
  • 导师
    钟宜生
  • 学科名
    控制科学与工程
  • 页码
    160
  • 保密级别
    公开
  • 培养单位
    025 自动化系
  • 中文关键词
    微分博弈,区域攻防博弈,界栅,多智能体强化学习,带约束图的二分图最大匹配
  • 英文关键词
    differential games, reach-avoid games, barrier, multi-agent reinforcement learning, maximum bipartite matching with conflict graph

摘要

区域攻防博弈在军事对抗、无人车或无人机避障、机器人复杂环境救援、攻防游戏等领域有着广阔的应用前景。本文对多对多协同区域攻防博弈问题进行了研究。首先,讨论了防守组合与攻击者之间产生的带约束图的二分图最大匹配问题;然后基于微分博弈理论,定性和定量分析了多对多协同有界闭凸平面区域攻防博弈、多对多协同三维空间区域攻防博弈及几类特殊多对多协同区域攻防博弈等问题;最后采用多智能体强化学习方法,提出了基于严格最优反应图与sink均衡的策略评价方法,设计了基于受扰严格最优反应动态的策略搜索算法,证明了相关收敛性,并应用于矩阵博弈和网格区域攻防博弈。本文主要成果包括以下几个方面: (1) 提出并分析了一类带约束图的二分图最大匹配问题。此问题适用于二分图最大匹配中其中一方多个节点形成新的组合节点加入匹配中的情况。证明了此问题的计算复杂度为 NP-hard,并设计了近似求解算法——顺序匹配算法,证明了该算法为常数界算法并给出了一个常数界。 (2) 分别研究了在简单动态模型下的有界闭凸平面区域、三维空间及几类特殊的多对多协同区域攻防微分博弈在点抓捕和半径抓捕下的定性和定量问题。针对定性攻防微分博弈,给出了一防一、二防一及多防一的界栅的解析表达式,进而给出了攻防胜利域及判断攻防胜利的充要条件。分别给出了在平面区域和三维空间中成功拦截一个攻击者所需防守者的个数的紧上界。针对定量攻防微分博弈,给出了在不同定性博弈结果下攻防双方的最优策略。结合定性博弈结果与带约束图的二分图最大匹配,给出了拦截最多攻击者的最大开环匹配方案。给出了基于最大开环匹配方案和滚动优化的协同防守策略,此防守策略可保证被拦截的攻击者数量不递减。 (3) 提出了基于严格最优反应的多智能体强化学习的策略评价方法和搜索算法。给出了从随机博弈到元博弈的分析方法,引入了基于元博弈的严格最优反应图,进一步给出了 sink 均衡及其特性分析。给出了基于 sink 均衡的循环指标和记忆指标策略评价方法。通过结合单智能体强化学习与经验博弈理论,设计了受扰严格最优反应动态策略搜索算法,并证明了该算法可收敛到最优或近似最优策略。最后应用于矩阵博弈与网格区域攻防博弈。

Reach-avoid games have broad applications in the areas of military confrontation, collision avoidance between unmanned cars or aerial vehicles, robot rescue in complex environment and offensive and defensive games. This thesis investigates the multi-agent cooperative reach-avoid games. First, the maximum bipartite matching between defense coalitions and attackers with conflict graph is discussed. Then, for simple dynamics and simple environment, by using differential game theory, the multi-agent reach-avoid games in the convex closed two-dimensional domain, three-dimensional space and with some special setups are investigated from kind and degree, respectively. Finally, by using multi-agent reinforcement learning, a new multi-agent policy evaluation method based on strict best response graph and sink equilibrium is introduced. A class of policy seeking methods called perturbed strictly best response dynamics is proposed, and the associated convergence properties are proved. These theoretical results are applied to matrix games and reach-avoid grid games. The main contributions of this thesis are as follows: (1) A new class of maximum bipartite matching with conflict graph is proposed and analyzed. This class is applied for the situations when the nodes from certain set of nodes can get together to generate a new node and then this new node is added into this set. This class is proved to be NP-hard. A polynomial-time constant-factor Sequential Matching Algorithm to approximately solve this class is designed, and a constant-factor is also given. (2) Multi-agent reach-avoid differential games in the convex closed two-dimensional domain, three-dimensional space, and with special setups are respectively investigated from kind and degree, including the point capture and radius capture. For the game of kind, the analytical barriers for one defending one, two defending one and many defending one cases are constructed respectively. Furthermore, the winning regions for each team are presented and the conditions to determine the game winner are given. The maximum number of defenders required to capture an attacker in the two or three dimensional game space before the attacker reaches the goal region, is given. For the game of degree, by taking different payoff functions based on the outcome of the game of kind, optimal strategies for the players are presented. The combinations of the game of kind and maximum bipartite matching with conflict graph output an open-loop task assignment such that the maximum attackers are captured. A receding horizon defense strategy where in each horizon an open-loop task assignment problem is solved, is proposed, which can guarantee the number of total captured attackers to be non-decreasing. (3) Policy evaluation and seeking methods based on strictly best response graph are proposed for multi-agent reinforcement learning. The model transition from stochastic games to meta games is presented. The strictly best response graph over the meta games is introduced. Furthermore, the sink equilibrium is defined upon the strictly beset response graph and its characteristics are discussed. Two metrics (cycle-based and memory-based metrics), grounded on the sink equilibrium, are proposed for the evaluation and ranking of policies in multi-agent learning. By combining single-agent reinforcement learning and empirical game theory, a perturbed strictly best response dynamics algorithm for the policy seeking is proposed, whose convergence to optimal or quasi-optimal polices is also given. This algorithm is then applied to solve matrix games and reach-avoid grid games.