登录 EN

添加临时用户

动态三维场景理解与重建

Dynamic 3D Scene Understanding and Reconstruction

作者:黄家晖
  • 学号
    2018******
  • 学位
    博士
  • 电子邮箱
    462******com
  • 答辩日期
    2023.05.15
  • 导师
    胡事民
  • 学科名
    计算机科学与技术
  • 页码
    166
  • 保密级别
    公开
  • 培养单位
    024 计算机系
  • 中文关键词
    动态三维场景, 运动理解, 动态定位与建图, 点云注册, 表面重建
  • 英文关键词
    Dynamic 3D scene, Motion understanding, Dynamic SLAM, Point cloud registration, Surface reconstruction

摘要

随着人工智能算法与实际生产生活日渐紧密的结合,越来越多的复杂的智能化需求也随之产生。其中,对智能体自身所处三维环境的准确感知是计算机进行行为规划、决策与执行的必要基础。然而,大多数场景都在随时间不断变化,算法不仅需要准确刻画场景静态部分,还需要对动态部分进行描述,如物体运动轨迹、变形趋势、多目标关联关系等等。除此之外,还需要针对不同传感器输入设计对应的处理流程,并建立准确的三维模型,服务于下游任务。针对上述研究挑战,本文针对不同的输入模态与不同类型的动态场景,以建立稠密三维几何为目标设计系列动态三维场景理解与重建算法,主要的研究内容和贡献包括:1. 提出了首个通用跨场景的实时动态定位与建图算法,通过对双目视频进行多物体运动聚类,能快速准确地恢复相机位姿和运动刚性物体的轨迹。为了在兼顾效率的同时,有效地从双目视频的杂乱路标点中分割出运动目标,采用了基于聚类的异构条件随机场模型,融合各项先验对多目标建立匹配关系;使用滑动窗口方法对状态变量进行优化,同时提升场景静态与动态部分的恢复质量。2. 提出了动态三维点云的多刚体分割与联合注册方法,解决了传统运动分割的语义依赖问题和多扫描输出的全局不一致问题。采用三维场景流的注册表示方式,仅使用匹配关系对刚性运动部件进行分割;设计了带权置换同步算法与运动分割同步算法,最小化不一致性误差,并给出了相应的最优性证明。实际算法在点云注册、运动分割等评测基准上均取得了较高的准确性。3. 提出了基于函数映射表示的柔性体点云联合注册框架,有效处理输入中的残缺、遮挡等问题。创新性地采用在点云上定义的基函数之间的线性映射表示柔性体匹配关联关系,将高维复杂的点云匹配转化为低维线性的函数匹配,支持使用迭代优化的同步模块对循环一致性进行增强,从而恢复任意点云对之间的准确三维场景流。大量实验表明,本框架相较前人方法精度更高、更加鲁棒。4. 提出了基于神经伽辽金方法的动态场景表面重建方法,从三维点云中重建表面几何。针对传统重建方法无法处理较大噪声和残缺,而基于深度学习的方法对表面拟合精度欠佳的难题,以数据驱动的方式学习局部基函数分布,并利用有限元分析方法构建、求解最小化拟合误差的线性系统。该方法对大规模静态场景和动态序列输入都能有效控制误差,获得准确的重建结果。

With the increasingly tight integration of Artificial Intelligence (AI) algorithms with life and production, there have been more and more requirements for complicated AIrelated applications being generated. Among them, an accurate perception of the 3D environment surrounding the agent is vital for planning, decision-making, and execution. However, most of the real-world scenes are changing with time. This requires the algorithm to describe the static part of the scene as well as its dynamic part accurately, such as the objects’ motion trajectories, deformations, instance associations, etc. Additionally, it is also necessary to design corresponding workflows for different sensors, to maximize the use of information and reconstruct accurate 3D models for downstream tasks. In response to the aforementioned research challenges, this paper aims to design a series of algorithms for dynamic 3D scene understanding and reconstruction to establish dense 3D geometry, targeting different input modalities and types of dynamic scenes. The main research content and contributions include:1. The first general-purpose and real-time algorithm for dynamic localization and mapping. By clustering multi-object motions from the input stereo videos, it can rapidly and accurately recover camera poses and trajectories of moving rigid objects. To efficiently segment moving targets from cluttered landmarks in stereo videos, a clustering-based heterogeneous conditional random field model is employed to integrate various priors for establishing correspondences among multiple targets. Sliding window optimization is used for state estimation and can improve the quality of both static and dynamic parts of the recovered scene.2. Multibody dynamic scene segmentation and joint registration method for point clouds. This method addresses the over-reliance issue of motion segmentation on the semantic information in 3D point clouds as well as the inconsistency problem in matching and segmentation. Three-dimensional scene flow representation is adopted to segment the rigid parts solely based on the correspondences. Weighted permutation synchronization and motion segmentation synchronization algorithms are designed, along with corresponding optimality proofs, to minimize inconsistency errors. The proposed algorithm achieves high accuracy in point cloud registration, motion segmentation, and other evaluation benchmarks.3. A dynamic scene registration framework for non-rigid point clouds based on the functional map representation. A novel representation describing the linear mapping of the basis functions defined over point clouds is proposed, and it transforms the high-dimensional point-cloud-level correspondences into the low-dimensional linear mapping of functions. It can also enhance the cycle consistency through an iterative optimization-based synchronization module, recovering the three-dimensional scene flow between arbitrary pairs of point clouds. Extensive experiments demonstrate that this framework achieves higher accuracy and robustness compared to previous methods.4. A surface reconstruction method for dynamic scenes based on a neural Galerkin formulation. This method can reconstruct dense surface geometry from 3D point clouds. Addressing the limitations of traditional reconstruction methods in handling significant noise and incompleteness, as well as the poor surface fitting accuracy of deep learning-based methods, learnable basis functions are inferred in a data-driven manner, and a linear system minimizing the surface fitting error is built, and solved using the finite element analysis. This method achieves low errors for large-scale static scenes as well as dynamic sequences, generating accurate reconstruction results.