近年来,随着人工智能、大数据等技术的快速发展,自动驾驶汽车进入技术快速演进的新阶段。现实世界中的自动驾驶汽车往往运行在动力学特性会随时间变化的时变交通环境中。环境中时变性的来源复杂,形式多样,包括但不限于汽车自身运动学与动力学特性随时间的变化、控制响应的动态时滞等车内因素,以及道路条件变化、周围交通参与人特性的变化等外界环境因素。在高时变性的交通环境中,基于规则或学习的固定行驶策略往往无法根据环境变化进行自适应调整,从而失去决策的最优性。更重要的是,环境时变性的存在可能会导致自动驾驶汽车对安全边界的判断出现误差,加重自动驾驶系统的安全隐患,威胁到自动驾驶载客与其他交通参与者的人身安全。针对以上描述的交通环境时变性所导致的自动驾驶系统次优性与安全性问题,本文对时变环境下自动驾驶汽车的建模与决策方法开展研究,提出了一套高效、灵活、安全可控的时变环境下自动驾驶决策框架,并对自动驾驶学习进化能力的量化评估方法进行了研究。本研究首先提出了数据驱动的时变环境下自动驾驶汽车建模方法。针对边界清晰的离散隐变量时变环境与边界模糊的连续隐变量时变环境,分别提出了基于无限混合高斯过程的混合模型与基于贝叶斯推断的概率隐变量模型,用于时变自动驾驶环境的动力学预测。同时,建立具有时滞感知的马尔可夫决策过程模型,实现对执行器时滞效应的高效处理。针对以上研究得到的高复杂度、强非线性的概率动力学模型,本研究进一步提出了基于蒙特卡洛采样择优的模型预测控制方法,通过粒子采样的方法实现决策序列的高效优化,从而实现实时决策。为了充分利用历史决策信息,引入了基于交叉熵优化的优质决策序列筛选方法,提高了决策的质量与效率;基于机会约束,实现了高置信度的安全决策。最后,提出了自动驾驶汽车学习进化性量化评估方法,填补了当前主流自动驾驶评价体系对学习进化能力评价的空白。为实现学习进化性高效、全面、公平的量化评估,提出了基于集成对抗强化学习的时变测试场景生成方法,针对被测车辆的行为生成多样化的挑战性场景;基于纳什均衡理论,实现了多场景的自适应赋权与统一量化评估。为验证本研究所提出方法的实际效果,设计并进行了时变交通环境下的仿真实验、硬件在环实验与实车实验。实验结果表明,本研究提出的方法可以在保障计算实时性的前提下,实现自动驾驶汽车在若干典型时变环境下的自适应安全决策与控制。
In recent years, with the rapid development of artificial intelligence, big data and other technologies, autonomous vehicles have entered a new stage of rapid technological evolution. Autonomous vehicles in the real world often operate in complex traffic environments with high non-stationarity, the sources of non-stationarity in the traffic environment are complex and come in various forms, including but not limited to in-vehicle factors like the changes of the kinematics and dynamic characteristics of the vehicle over time, the dynamic time delay of the control response, as well as external environmental factors like changes in road conditions and characteristics of surrounding traffic participants. In the highly non-stationary traffic environment, the fixed driving strategy based on rules or learning often cannot be adaptively adjusted according to the environment changes, thus losing the optimality of decision-making. More importantly, the existence of the environment non-stationarity may lead to errors in the judgment of the safety boundary of autonomous vehicles, aggravate the safety hazards of the automatic driving system, and threaten the personal safety of passengers and other traffic participants. Aiming at the suboptimal and safety problems of the autonomous driving system caused by the non-stationary traffic environment described above, this research studies the modeling and decision-making methods of autonomous vehicles in the non-stationary environments, and proposes an efficient, flexible, safe, and controllable autonomous driving decision-making framework under non-stationary environments. The quantitative evaluation method of autonomous driving learning ability is also studied. This study firstly proposes a data-driven approach to modeling autonomous vehicles in non-stationary environments. For discrete non-stationary environments with clear boundaries and continuous non-stationary environments with fuzzy boundaries, an expert model based on infinite mixture of Gaussian processes and a probabilistic latent variable model based on Bayesian inference are proposed for dynamics prediction. At the same time, a delay-aware Markov decision process model is established to achieve efficient processing of the actuator‘s time-delay effect. In view of the high complexity and strong nonlinear probabilistic dynamics models obtained in the above research, this study further proposes a linear-time-complexity model predictive control method based on sampling and realizes the efficient optimization of decision sequence by means of particle sampling. In order to make full use of historical decision-making information, a high-quality decision-making sequence screening method based on cross-entropy optimization is introduced, which improves the quality and efficiency of decision-making. Finally, in order to efficiently and quantitatively evaluate the learning evolution ability of autonomous driving agents, a non-stationary test scene generation method based on ensemble adversarial reinforcement learning is proposed. Through adversarial reinforcement learning and ensemble learning, diverse and challenging scenarios for the tested vehicle are generated; based on Nash equilibrium, multi-scenario adaptive weighting is realized, thus realizing the unified quantitative evaluation of autonomous driving strategies in multiple scenarios. In order to verify the actual effect of the method proposed in this study, simulation experiments, hardware-in-the-loop experiments, and real vehicle experiments under non-stationary traffic environments are designed and carried out. The experimental results show that the method proposed in this study can realize the adaptive safety planning and control of autonomous vehicles in non-stationary environments under the premise of ensuring the real-time performance of computing.