登录 EN

添加临时用户

基于驾驶场景的跨时空多模态融合网络研究

Spatiotemporal Multimodal Fusion Network for Driving Scenario

作者:熊子钰
  • 学号
    2019******
  • 学位
    硕士
  • 电子邮箱
    xio******.cn
  • 答辩日期
    2022.05.18
  • 导师
    廖庆敏
  • 学科名
    电子与通信工程
  • 页码
    75
  • 保密级别
    公开
  • 培养单位
    023 电子系
  • 中文关键词
    驾驶场景,跨时空,多模态,目标检测,轨迹预测
  • 英文关键词
    Driving Scenario,SpatioTemporal,Multimodal,Object Detection,Trajectory Prediction

摘要

自动驾驶系统中,车辆对环境的感知和对未来驾驶状态的预测,是十分重要的两个功能模块。然而,驾驶场景面临复杂的交互环境,恶劣天气、目标遮挡、高速移动都会导致目标检测性能的下降,从而影响感知模块的功能。同时,在自动驾驶车辆的实践中,会面临对车辆自主决策的模块功能溯源,而基于端到端网络的黑箱特点导致上述问题难以解决。针对以上驾驶环境中的挑战,本文从传感器数据融合、时序数据关联和基于认知机制的轨迹预测三个方面出发,开展了基于驾驶场景的多模态跨时空网络研究,主要研究内容和创新点如下: (1)提出了基于Radar和LiDAR数据的深度融合网络模型。针对驾驶场景的全天候和鲁棒性检测要求,本文首先对当前主流传感器的特点进行了对比分析,确立了使用包含动态回波的Radar和包含精确结构信息的LiDAR作为传感器数据来源。针对Radar回波缺乏高度和含有较大噪声的特点,设计了基于鸟瞰图的联合体素划分和点云增强方法,并通过空间和通道注意力的计算建立不同模态特征之间的深度信息融合结构,提升了单帧目标检测性能。 (2)提出了基于前景上下文抽取和跨时空语义聚合的目标检测模型。针对驾驶场景中因遮挡和稀疏点云等情况导致的误检和漏检,本文首先设计了一种紧凑而高效的前景上下文建模方法,在几乎不增加计算资源的情况下有效增强目标前景特征。同时,通过建立两阶段图神经网络,构建RoI级别的消息传递网络和基于相似度度量的跨时序图卷积层,有效利用相邻帧信息提升了目标帧检测性能及鲁棒性,在nuScenes数据集上达到领先水平。 (3)提出了基于认知和数据双向驱动的轨迹预测框架。针对驾驶场景决策任务的模块功能溯源的要求,本文从驾驶员的认知过程出发,构建认知机制与算法网络模块之间的映射关系,并通过高效的环境交互编码和数据驱动的预训练模型,提高了轨迹预测的准确性。 本文从跨时空和多模态两个角度出发,结合驾驶场景的目标检测与轨迹预测任务,提出了Radar/LiDAR数据的有效融合方法,建立了点云视频的有效时空聚合模型,并结合认知机理设计了基于心智理论的轨迹预测网络架构,在实际大型驾驶场景数据集nuScenes及Argoverse上达到了领先水平,对自动驾驶的未来研究和实际应用均具有重要意义。

In the autonomous driving system, the vehicle’s perception of the environment and the prediction of the future driving state are two very important functional modules. However, the driving scene faces a complex interactive environment, and bad weather, target occlusion, and highspeed movement will also lead to a decrease in the performance of target detection, which will affect the function of the perception module. At the same time, in the practice of autonomous vehicles, it will face the traceability of module functions for autonomous decisionmaking of vehicles, and the blackboxcharacteristics based on the endtoend network make the above problems difficult to solve. Aiming at the above challenges in the driving environment, this paper conducts multimodal crosstemporal network research based on driving scenarios from three aspects: sensor data fusion, temporal data correlation and trajectory prediction based on cognitive mechanism, and the main research content and innovation points are as follows:(1) A deep fusion network model based on Radar and LiDAR data is proposed. In view of the allweatherand robustness detection requirements of driving scenarios, this paper first analyzes the haracteristics of the current mainstream sensors, and establishes the use of LiDAR with accurate structural information and Radar with dynamic echoas sensor data sources. Aiming at the haracteristics of Radar echo lacking height andcontaining large noise, the combined voxel division and point cloud enhancement method based on bird’s eye view is designed, and the deep fusion structure between different modal features is established through the calculation of space and channel attention, which improves the detection performance of single frame target. (2) A target detection model based on foreground context extraction and crosstemporal semantic aggregation is proposed. Aiming at the false and missing detections caused by occlusion and sparse point clouds in driving scenarios, this paper first designs a compact and efficient foreground context modeling method to effectively enhance the targetforeground characteristics without increasing the amount of computing resources. At the same time, by establishing a twostage graph neural network, constructing a RoIlevel messagepassingnetwork and a temporal graph convolutional layer based on similarity measurement, the performance and robustness of target frame detection are effectively improved by using adjacent frame information, and the leading level in the nuScenes dataset is reached.(3) A trajectory prediction framework based on cognitive and data driven is proposed. Aiming at the requirements of the driving scenario for the functional traceability of the module for the decisionmaking task, this paper constructs the mapping relationship between the cognitive mechanism and the algorithm network module from the driver’s cognitive process, and improves the accuracy of trajectory prediction through efficientenvironmental interaction encoding and datadrivenpretraining model.Starting from the perspectives of spatialtemporaland multimode, combined with the target detection and trajectory prediction tasks of driving scenes, this paper proposes an effective fusion method of Radar/LiDAR data, establishes an effective spatiotemporal aggregation model of point cloud video, and designs a trajectory prediction network architecture based on mental theory based on cognitive mechanism, which reaches the leadinglevel in the actual largescale driving scene dataset nuScenes and Argoverse, which is ofgreat significance for future research and practical application of autonomous driving