医学图像配准是数字化手术设计中的关键技术,通过将不同设备、不同时间/条件下获取的多模态/多时相医疗数据进行配准融合,可以获得更全面的患者信息。基于深度学习的医学图像配准方法近些年获得了飞速的发展,然而该领域目前仍面临着一些挑战。首先,由于医学图像配准数据标注繁琐,需要丰富的专家知识,导致可用于训练的配准标注数据很少;第二,不同模态之间在外观差异较大时模态迁移错误导致配准质量下降;第三,不同时相数据当形态差异大时配准精度有限。因此,本论文围绕这些难点,分别对基于深度学习的多模态医学图像配准和多时相医学图像配准两方面问题进行研究,提升配准的质量。针对模态差异较大的 CT(电子计算机断层扫描)和 MRI(磁共振图像)间的三维图像配准问题,本文提出了一种基于加速的扩散模型和双向置信模态迁移的多模态配准框架,设计权重共享的三分支生成对抗网络(Generative Adversarial Network,GAN)和加速的去噪扩散模型,通过 GAN 加速扩散模型采样,并提升模态迁移的质量,从而将多模态配准简化为单模态问题,消除不同模态间的外观差异。针对医学训练数据难以获得,模型易过拟合的问题,本文设计一种基于形状约束的配准模型预训练方法,通过数据增广合成具有任意对比度和形状的图像用于预训练,以保证模型的泛化能力。针对模态迁移生成图像的可信度不足的问题,本文设计一种双向置信配准网络估计双向模态迁移对配准的贡献。在腹部数据集上的实验结果表明本文方法的配准精度高于现有方法。针对不同呼吸时相肺部 CT 图像中组织结构形变大的问题,本文设计了一种基于自监督表征学习的稀疏配准算法,设计基于 Swin-Transformer 的嵌入提取网络,引入体素级的对比学习框架以解决数据标注稀缺的问题,从而捕获长距离的形变,并编码不同解剖部位的语义信息,高效提取鲁棒嵌入,克服呼吸运动中图像对比度动态改变造成的困难。针对现有多分辨率架构网络易陷入局部最优的问题,本文提出一种稀疏采样指导的无监督配准网络,通过在关键点处预测稀疏位移热图,有效地融合额外的语义指导。实验表明,本文设计的方法相比现有多时相肺 CT 配准方法可以达到更高的目标配准精度。
Medical image registration is an essential step in digital surgical design. Through registration and fusion of multi-modal/multi-temporal medical data acquired with different equipment and under different times/conditions, more comprehensive patient information can be obtained. Medical image registration methods based on deep learning have developed rapidly in recent years, but the field is still facing some challenges. First of all, due to the tedious annotation of medical image registration data and the need for rich expert knowledge, few annotation data can be used for training. Secondly, when the appearance difference between different modalities is large, the modality translation error leads to a decrease in registration accuracy. Thirdly, the accuracy of multi-temporal image registration is limited due to the large motion amplitude. Therefore, this thesis focuses on these difficult issues and researches the multi-modal/multi-temporal medical image registration methods based on deep learning respectively to improve registration accuracy.Aiming at the problem of 3D image registration between CT and MRI with large modality differences, this thesis proposes a multi-modal registration framework based on the accelerated diffusion model and bidirectional confidence-aware modality translation. A weight-sharing three-branch generative adversarial network and an improved denoising diffusion network are designed, using GAN to accelerate diffusion sampling and improve the quality of modality translation. Thus, the multi-modal registration is simplified as a single-modal problem, and the appearance differences are eliminated. Aiming at the problem that medical training data is difficult to obtain and the model is easy to overfit, this thesis designs a pre-training method based on shape constraint. Images with arbitrary contrast and shapes are synthesized for pre-training through data augmentation to ensure the generalization ability of the model. Aiming at the problem of insufficient credibility of the images generated by modality translation, this thesis designs a bidirectional confidence-aware registration network to estimate the contribution of bidirectional modality translation to the registration. Experimental results on the abdominal dataset show that the registration accuracy of the proposed method is higher than that of the existing methods. Aiming at the problem of large tissue structure deformation in lung CT images of different respiratory phases, this thesis designs a sparse registration algorithm based on self-supervised representation learning. An embedding extraction network based on Swin-Transformer is designed, and a voxel-level contrastive learning framework is introduced to solve the problem of scarce data labels, so as to capture long-distance deformation. The semantic information of different anatomical parts is encoded to extract robust embeddings efficiently and overcome the difficulties caused by dynamic image contrast changes during respiratory motion. Aiming at the problem that the existing multi-resolution architecture networks are easy to fall into local optima, this thesis proposes an unsupervised registration network guided by sparse sampling, which effectively incorporates additional semantic guidance by predicting sparse displacement heatmaps at key points. Experiments show that the proposed method can achieve higher registration accuracy than the existing multi-temporal lung CT registration methods.