登录 EN

添加临时用户

基于全局信息融合与交叉匹配的双目立体视觉算法研究

Research on Stereo Vision Algorithm Based on Global Information Fusion and Cross-Matching

作者:张祎頔
  • 学号
    2020******
  • 学位
    硕士
  • 电子邮箱
    102******com
  • 答辩日期
    2023.05.13
  • 导师
    杨文明
  • 学科名
    电子信息
  • 页码
    50
  • 保密级别
    公开
  • 培养单位
    599 国际研究生院
  • 中文关键词
    双目立体视觉, 结构相似性, 交叉匹配, 注意力机制
  • 英文关键词
    Binocular stereo vision, Structure similarity, Cross matching, Attention mechanism

摘要

双目立体视觉算法是最为经典且应用最广泛的一种深度估计被动传感技术,具有极高的应用前景和研究价值。然而,现有的双目立体视觉算法在面对遮挡、弱纹理等不适定区域时仍然存在较为严重的预测错误,并且构建的相关量存在误匹配和多峰干扰的问题,为视差估计带来不良影响。针对这些问题,本文开展了基于全局信息融合与交叉匹配的双目立体视觉算法研究,帮助网络改善对不适定区域的估计效果,并充分挖掘视图间匹配关系,优化相关量的构建。首先,本文提出了一种结合全局信息和结构相似性的双目立体视觉算法SAGIF-Stereo,从而改善对遮挡、弱纹理等不适定区域的视差估计效果。提出的Conv-Trans特征提取模块,在卷积网络的基础上,融合Transformer结构为不适定区域获取全局信息。通过并行双分支结构充分挖掘局部与全局特征,并利用转换单元自适应地调整特征域,促进特征融合,从而增强了网络在捕捉细节纹理信息以及全局结构信息等多层次特征的表现能力。同时,为视差优化部分引入基于广义注意力机制的结构感知模块,由结构相似性模块和结构聚合模块实现。通过学习场景上下文特征中的结构相似性信息,获得结构感知注意力矩阵,根据注意力矩阵聚合相关性特征,来帮助不适定区域获得来自结构相似区域的准确视差信息。在多个真实场景数据集上的结果表明,SAGIF-Stereo具有领先的跨数据集泛化能力,并且有效地改善了不适定区域的估计效果。其次,本文提出了一种基于交叉匹配的双目立体视觉算法,来实现对相关量的优化。设计的全局匹配模块,通过双流对称Cross-Former Block进行双目特征的交叉聚合,建立视图间的交互作用,从而充分挖掘双目特征间的匹配关系,在扩大特征感受野的同时,关注视图间相关性信息,并抑制冗余信息的干扰。此外,在视差回归任务的基础上建立了匹配分类任务,通过对相关量添加匹配焦点损失的直接约束,实现了相关量的单峰化,进一步提高相关量的准确性和鲁棒性。实验结果证明,提出的全局匹配模块与匹配焦点损失能显著改善相关量的误匹配现象。并且与SAGIF-Stereo相结合的SAGIF-GMM网络在多个数据集上取得优异的性能。

Stereo vision algorithm is the most classical and widely applied passive sensing technologies for depth estimation, with great potential for application and research value. However, existing stereo vision algorithms still have serious prediction errors when facing occlusions, weak textures, and other ill-posed regions. Furthermore, the constructed correlation volume suffers from mismatch problems and multi-peak interference issues, which adversely affect the disparity estimation. To address these issues, this thesis conducts research on stereo vision algorithms based on global information fusion and cross-matching. The aim is to help the network improve estimation results in ill-posed regions and fully explore the matching relationships between views, optimizing the correlation volume construction.Firstly, this thesis proposes a stereo vision algorithm called SAGIF-Stereo that combines global information and structural similarity to improve disparity estimation in ill-posed regions such as occlusion and weak textures. The proposed Conv-Trans feature extraction module integrates Transformer structures into the convolutional network to obtain global information for ill-posed regions. By fully exploiting both local and global features through a parallel dual-branch structure and adaptively adjusting the feature domain using transformation units, the network enhances its ability to capture multi-level features, including detailed texture information and global structural information. Simultaneously, a structure-aware module based on a generalized attention mechanism is introduced for disparity optimization, implemented through a structural similarity module and a structure aggregation module. By learning structural similarity information from contextual features of the scene, a structure-aware attention matrix is obtained, which is used to aggregate correlation features based on the attention matrix. This helps ill-posed regions acquire accurate disparity information from structurally similar areas. Results on multiple real-world datasets demonstrate that SAGIF-Stereo exhibits superior cross-dataset generalization capability and effectively improves estimation performance in ill-posed regions.Secondly, this thesis proposes a stereo vision algorithm based on cross-matching to optimize the correlation volume. The designed global matching module employs a dual-stream symmetrical Cross-Former Block to cross-aggregate binocular features, establishing interactions between views. This approach fully exploits the matching relationship between binocular features, enlarges the receptive field, focuses on inter-view correlation information, and suppresses interference from redundant information. Additionally, a matching classification task is established based on the disparity regression task. By adding a direct constraint of matching focus loss to the correlation volume, the optimization of the correlation volume with a single peak is achieved, further improving its accuracy and robustness. Experimental results demonstrate that the proposed global matching module and matching focus loss significantly improve the issue of mismatch in the correlation volume. The combined SAGIF-GMM network, incorporating the proposed methods with SAGIF-Stereo, achieves excellent performance on multiple datasets.