登录 EN

添加临时用户

基于几何一致性和窗口注意力的多视图三维重建研究

Research on Multi-view Reconstruction Based on Geometric Consistency and Window Attention

作者:廖晋立
  • 学号
    2020******
  • 学位
    硕士
  • 电子邮箱
    893******com
  • 答辩日期
    2023.05.12
  • 导师
    张凯
  • 学科名
    电子信息
  • 页码
    84
  • 保密级别
    公开
  • 培养单位
    599 国际研究生院
  • 中文关键词
    多视图三维重建,几何一致性,特征匹配,正则化
  • 英文关键词
    multi-view stereo,geometric consistency,feature matching,regularization

摘要

三维重建是目前计算机视觉最前沿的研究方向之一,利用计算机通过二维图像建立三维模型,更有利于人们对目标对象的观察、理解和分析,因此在城市数字化、文物保护、虚拟现实、自动驾驶等领域三维重建获得了广泛的应用。相比于传统主动式三维重建需要直接对被重建场景扫描获取几何结构信息,被动式三维重建无需在被重建现场直接扫描,只需使用普通的相机拍摄二维图像,降低了重建对设备的高要求,也减少了重建成本。在被动式三维重建中,尽管基于深度学习的多视图三维重建算法取得了最好的效果,但仍存在许多需要解决的问题。本文将针对基于深度学习的多视图三维重建算法在三维重建过程中遇到的困难进行改进优化,提高重建模型的准确度和完整度。本文的主要创新点有以下三点:(1)针对多视图三维重建中的几何一致性约束进行探讨,提出了基于几何一致性约束的多视图三维重建算法,通过估计几何一致性掩码约束概率体的几何一致性,同时动态调整下一阶段的深度采样范围。此外,加入几何一致性损失函数,从不同视角约束深度图的估计,提高了重建模型的精确度和完整度。(2)针对复杂结构场景的多视图三维重建,本文提出了基于特征匹配优化的多视图三维重建算法,利用基于窗口注意力机制的特征匹配算法交替执行特征内自注意力机制和特征间交叉注意力机制,对极线上附近区域的特征进行匹配,同时引入匹配代价体聚合方式,提高了复杂纹理结构场景重建的精确度和完整度。(3)针对弱纹理甚至无纹理区域三维重建效果不佳的问题,本文探究了多视图三维重建算法中正则化网络感受野大小对重建效果的影响,提出基于多尺度全局感受野正则化的多视图三维重建算法,利用全局感受野的正则化网络提高重建模型的完整度和平滑度。本文对上述创新点均在多视图三维重建经典的数据集——DTU数据集、Tanks and Temples数据集和BlendedMVS数据集上进行测试,相比于最先进的算法在准确度和完整度上都有较大提升。同时也对各优化模块进行消融实验,证明各优化模块对重建效果起到正向作用。

3D reconstruction is one of the most cutting-edge research directions of computer vision at present. The use of computer to build 3D models through 2D images is more conducive to people‘s observation, understanding and analysis of target objects. Therefore, 3D reconstruction has been widely used in the fields of urban digitalization, cultural relics protection, virtual reality, and automatic driving. Compared with the traditional active 3D reconstruction, which needs to directly scan the reconstructed scene to obtain geometric structure information, the passive 3D reconstruction does not need to scan directly at the reconstructed scene, and only needs to use an ordinary camera to take 2D images, which reduces the reconstruction’s high requirements for equipment. It also reduces rebuilding costs.In passive 3D reconstruction, although the multi-view 3D reconstruction algorithm based on deep learning has achieved the best results, there are still many problems to be solved. This paper will improve and optimize the difficulties encountered in the 3D reconstruction process of the multi-view 3D reconstruction algorithm based on deep learning, and improve the accuracy and completeness of the reconstruction model. The main innovations of this paper are as follows:(1)This paper discusses the geometric consistency constraints in multi-view 3D reconstruction and proposes a multi-view 3D reconstruction algorithm based on geometric consistency constraints. The geometric consistency of the probability volume is constrained by estimating the geometric consistency mask, which adjusts the depth sampling range of the next stage dynamically. In addition, a geometric consistency loss function is added to constrain the estimation of the depth map from different viewpoints, improving the accuracy and completeness of the reconstructed model.(2)For the multi-view 3D reconstruction on complex structural scenes, this paper proposes a multi-view 3D reconstruction algorithm based on feature matching optimization, using the window-based Attention feature matching algorithm to alternately execute the self-attention within the feature and the cross-attention between features. The features of the area near the epipolar line are matched, and the matching cost volume aggregation method is introduced to improve the accuracy and completeness of scene reconstruction with complex texture structures.(3)Aiming at the problem of poor 3D reconstruction in areas with weak texture or even no texture, this paper explores the influence of the size of the regularized network receptive field on the reconstruction effect in the multi-view 3D reconstruction algorithm. This paper proposes a window-based 3D Transformer regularization network on the multi-scale UNet structure and uses the regularization network of the global receptive field to improve the completeness and smoothness of the reconstruction model.In this paper, the above-mentioned innovations are tested on the classic multi-view 3D reconstruction datasets: DTU dataset, Tanks and Temples dataset, and BlendedMVS dataset. Compared with the most advanced algorithms, it is more accurate and complete. At the same time, ablation experiments were carried out on each optimization module, which proved that each optimization module played a positive role in the reconstruction effect.