登录 EN

添加临时用户

自动驾驶混杂交通场景中人的行为识别研究

Human Behavior Recognition Under Mixed Traffic Scenes for Autonomous Driving

作者:王思佳
  • 学号
    2017******
  • 学位
    博士
  • 电子邮箱
    wsj******.cn
  • 答辩日期
    2022.05.24
  • 导师
    杨殿阁
  • 学科名
    机械工程
  • 页码
    189
  • 保密级别
    公开
  • 培养单位
    015 车辆学院
  • 中文关键词
    自动驾驶,行为识别,人体姿态,手势识别,多层信息融合
  • 英文关键词
    Autonomous Driving, Behavior Recognition, Human Pose, Gesture Recognition, Multi-layer Information Fusion

摘要

正确理解混杂交通场景中人的行为意图,是面向高级别自动驾驶环境感知和认知的必要任务。然而,当前自动驾驶汽车难以系统、全面地识别混杂交通场景中人的行为意图,识别的准确性、鲁棒性和完整性尚有不足。本研究提出人的交通行为认知的一般范式,研究基于车载视觉的人体姿态估计方法和典型行为意图识别方法,促进自动驾驶汽车对混杂交通场景中人的行为识别更准确、鲁棒、全面。本文首先构建人的交通行为认知范式,研究交通场景人体姿态要素估计方法,为交警和行人的典型交通行为识别奠定理论和应用基础。分析自动驾驶汽车对人的交通行为认知需求,研究行为主体、车辆行驶环境和自车要素描述方法,构建一般性的行为映射模型,并通过人的典型交通行为分析验证提出范式的合理性。为了准确提取人体姿态要素,构造人体肢体级和关节级表征热图,进而提出融合肢体级与关节级的交通场景人体姿态估计方法。在交警指挥行为识别问题中简化应用人的交通行为认知范式,提出基于多模态人体姿态的交警指挥行为识别方法。根据人体姿态的坐标形式和热图形式,构造上半身关键点几何特征和关键点热图共现性特征,以循环神经网络为基础构建识别模型,面向在线推理提出双重投票后处理机制,实现连续、准确、稳定的交警手势识别;进一步地研究区分指挥方向的交通指挥行为识别方法,提出两阶段识别框架,将简单手势识别拓展至细粒度的指挥行为识别。在行人行为意图识别问题中具体应用人的交通行为认知范式,提出融合多层信息的行人行为意图识别方法。基于交通场景人体姿态数据构建典型姿态库,形成姿态先验信息,设计姿态匹配修正机制,提升输出姿态可靠性;将传感信息与先验信息相结合,提出多层行为认知要素信息混合融合的模块化多任务学习架构,实现行人动作级和意图级行为的全面、准确识别。构建人的交通行为识别数据库,在中国城市交通开放和半开放道路场景开展试验验证。特别地,建立了首个大规模的城市交通人体姿态数据集及前沿人体姿态估计算法基准,设计了考虑自动驾驶需求的性能评价方法,揭示了现有人体姿态估计研究在交通场景中应用的现状和局限性。相关试验结果表明,在人的交通行为认知范式基础上,提出的交通场景人体姿态估计方法、交警指挥行为识别方法和行人行为意图识别方法,能够达到更优的准确性、鲁棒性和完整性。

Correctly understanding human behaviors and intentions under mixed traffic scenes is a necessity for environment perception and cognition in high-level autonomous driving. However, it is difficult for current automated vehicles to recognize human behaviors under mixed traffic scenes in a systematic and comprehensive way, and the accuracy, robustness and completeness of human behavior recognition are still insufficient. This research proposes a general paradigm for human traffic behavior cognition, and studies human pose estimation methods and typical behavior and intention recognition methods based on on-board vision, so as to promote automated vehicles to recognize human behaviors under mixed traffic scenes more accurately, robustly, and comprehensively.First of all, a cognitive paradigm of human traffic behaviors is constructed, and an estimation method of human pose elements in traffic scenes is studied, laying a theoretical and practical foundation for recognition of typical traffic behaviors of traffic polices and pedestrians. The thesis analyzes the cognitive needs of automated vehicles to recognize human traffic behaviors, studies the description method of elements including behavior subjects, driving environment and the vehicle itself, builds a general behavior mapping model, and verifies the rationality of the proposed paradigm through the analysis of typical human traffic behaviors. In order to accurately extract the element of human pose, the heatmaps for human limb-level and joint-level representations are constructed, and then a human pose estimation method for traffic scenes that integrates limb-level and joint-level representations is proposed.In the issue of traffic command recognition, the simplified cognitive paradigm for human traffic behaviors is applied, and a traffic command recognition method based on multi-modal human pose is proposed. According to the coordinate form and heatmap form of the human pose, the upper-body keypoint geometric features and the keypoint heatmap co-occurrence features are constructed, the recognition model is built based on the recurrent neural networks, and a dual-voting post-processing mechanism for online inference is proposed. Consequently, accurate and robust traffic police gesture recognition is achieved. Furthermore, the traffic command recognition method distinguishing command directions is studied, and a two-stage recognition framework is proposed, to extend simple gesture recognition to fine-grained commanding behavior recognition. In the issue of pedestrian behavior and intention recognition, the detailed cognitive paradigm for human traffic behaviors is applied, and a pedestrian behavior and intention recognition method integrating multi-layer information is proposed. The thesis builds a typical pose library based on human pose data in traffic scenes and generates human pose priors. A pose matching and correction mechanism is additionally designed to improve the reliability of output poses. Combining sensory information with prior information, a modular multi-task learning architecture with multi-layer information of behavioral cognitive elements in a hybrid fusion way is proposed, to achieve comprehensive and accurate recognition of pedestrian action-level and intention-level behaviors. At last, the human traffic behavior recognition database is built, and experimental verification is carried out in open and semi-open Chinese urban scenes. In particular, the first large-scale human pose dataset in urban traffic scenes, and a benchmark for the state-of-the-art human pose estimation algorithms are established. A performance evaluation protocol considering the needs of autonomous driving is designed, and the current status and limitations of the application of existing human pose estimation research in traffic scenes is revealed. The experimental results show that, based on the cognitive paradigm of human traffic behaviors, the proposed method for human pose estimation in traffic scenes, the method for recognizing traffic police commands and the method for recognizing pedestrian behavior and intentions can achieve better accuracy, robustness and integrity.