自动驾驶策略改进和漏洞挖掘算法研究

Research of Autonomous Driving Policy Improvement and Vulnerability Discovery Algorithm

作者：穆野

学号

2020******
学位

硕士
电子邮箱

muy******.cn
答辩日期

2023.05.16
导师

廖庆敏
学科名

电子信息
页码

44
保密级别

公开
培养单位

599 国际研究生院

中文关键词

自动驾驶,危险驾驶场景生成,多智能体强化学习,自动驾驶策略提升

英文关键词

Autonomous Driving,Safety-critical Driving Scenarios Generation,MultiAgent Reinforcement Learning,Autonomous Driving Policy Improvement

摘要

交通安全是人类日常生活中的一个重要的问题。随着半导体技术与深度学习的飞速发展，自动驾驶技术有望成为高效且安全的解决方案。目前，“长尾问题”成为阻碍自动驾驶落地的关键因素。“长尾问题”指的是自动驾驶通过迭代训练已在绝大多数常见场景中已经达到人类水准，但由于现实世界交通场景的复杂性，仍有罕见且大量自动驾驶系统可能存在安全隐患的危险场景。因此，在将自动驾驶系统部署到现实世界之前需要对其进行充分安全检测，并根据检测结果对自动驾驶系统进行迭代提升。本文将针对自动驾驶系统的长尾问题进行研究，从安全检测和驾驶策略提升两个方面分析其中的关键问题，并给出相应的解决方案。在安全性检测方面，真实世界的道路测试存在着路测成本高、测试效率低和事故漏洞检测不全面的问题。主要的解决方案是在仿真器中对自动驾驶策略进行安全性检测。但目前已有安全检测方法存在着道路场景及事故类型单一，未对事故责任进行判定等问题，缺少高效、合理、全面的漏洞挖掘算法。针对这一问题，本文提出了一种基于多智能体强化学习的自动驾驶策略漏洞挖掘框架。具体而言，本文通过引入多智能体强化学习训练对抗性攻击智能体来协作构建罕见的危险场景，并设计了责任仲裁奖励和场景差异性奖励来对事故责任进行判定并挖掘多样的危险场景。此外，本文还提出了一种通过构建驾驶策略池训练具有迁移能力的攻击策略的方法。在多个仿真环境中对多种驾驶策略的漏洞挖掘实验表明，相比于基线方法，本文的方法可以更高效且更多样地挖掘出罕见的自动驾驶策略决策漏洞导致的事故场景。在策略提升方面，遇到新事故场景后，已有方法通常需要分析场景轨迹并提取与事故相关的部分，以对自动驾驶策略进行改进。但由于事故场景十分稀疏，在全部驾驶场景中占比很低，提升后的自动驾驶策略容易对事故场景过拟合并表现得过于保守。针对如何提升驾驶策略在不降低其驾驶性能的同时避免过拟合这一问题，本文先是基于已提出的漏洞挖掘框架提出了一种基于动作熵的事故场景关键状态动作对提取方法，进而提出了一种基于动作熵的自动驾驶策略迭代提升算法，实现了自动驾驶策略从漏洞挖掘到策略提升的完整流程。实验表明，本文提出的方法可以迭代提升自动驾驶策略，既避免了对事故场景过拟合，又不降低其在常见驾驶场景下性能。

Traffic safety is an important issue in daily human life. With the rapid development of semiconductor technology and deep learning, autonomous driving technology is expected to become an efficient and safe solution. Currently, the "long tail problem" is a key factor hindering the implementation of autonomous driving, which refers to the rare and large number of potential safety hazards in autonomous driving systems due to the complexity of real-world traffic scenarios, even though they have reached human-level performance in most common scenarios through iterative training. Therefore, before deploying autonomous driving in the real world, it is necessary to conduct sufficient safety testing and iteratively improve the system based on the test results. This thesis focuses on the long tail problem of autonomous driving systems, analyzes the key issues from the aspects of safety testing and driving policy improvement, and proposes corresponding solutions.In terms of safety testing, road testing in the real world has the problems of high cost, low test efficiency, and incomplete detection of accident vulnerabilities. The main solution is to conduct safety testing of autonomous driving policies in a simulator. However, there are currently safety testing methods that have single road scenarios and accident types, and do not determine accident responsibility, lacking efficient, reasonable, and comprehensive vulnerability discovery algorithms. To address this problem, this thesis proposes a vulnerability discovery framework for autonomous driving policies based on multi-agent reinforcement learning. Specifically, this thesis cooperatively constructs rare hazardous scenarios by introducing multi-agent reinforcement learning training for adversarial attack vehicles, and designs scene difference rewards and responsibility arbitration rewards to find diverse hazardous scenarios and determine accident responsibility. In addition, this thesis proposes a method for training attack policies with transfer attack capability by constructing a driving policy pool. Experiments on vulnerability discovery of multiple driving policies in multiple simulation environments show that compared with baseline methods, this method can more efficiently and diversely find accident scenarios caused by rare vulnerabilities in autonomous driving policies.In terms of policy improvement, existing methods usually analyze the trajectory of the scenario and extract the parts related to accidents to improve autonomous driving policies when encountering new accident scenarios. However, due to the sparsity of accident scenarios, they account for a very low proportion of all driving scenarios, and the improved autonomous driving policy is prone to overfitting and overly conservative in accident scenarios. To address the problem of how to improve the driving policy to avoid overfitting while improving its driving performance, this thesis first proposes a method for extracting key state-action pairs in accident scenarios based on action entropy, based on the previously proposed vulnerability discovery framework. Then, a self-driving policy iterative improvement algorithm based on action entropy is proposed, which implements the complete process from vulnerability discovery to policy improvement of autonomous driving policies. Experiments show that the method proposed in this thesis can iteratively improve autonomous driving policies without overfitting in accident scenarios and without decreasing performance in common driving scenarios.

概览页

自动驾驶策略改进和漏洞挖掘算法研究

Research of Autonomous Driving Policy Improvement and Vulnerability Discovery Algorithm

摘要

请选择登录入口

添加临时用户

概览页

自动驾驶策略改进和 漏洞挖掘算法研究

Research of Autonomous Driving Policy Improvement and Vulnerability Discovery Algorithm

摘要

国内学位论文

国外学位论文

请选择登录入口

自动驾驶策略改进和漏洞挖掘算法研究