信息技术的飞速发展推动了传统病理向数字病理转型,催生了计算病理学这一新兴领域。计算病理学主张使用计算的方法分析组织病理图像,研究病理学相关问题,意图辅助病理医生提高组织病理检查的效率和客观性,缓解组织病理临床实践中的痛点问题。然而,组织病理图像精细化标注难度很高,全视野数字切片图像尺寸巨大,这些问题给组织病理图像计算分析方法的设计和落地转化带来了极大的挑战。为此,本论文以组织病理图像分析为研究主线,以深度学习方法为主要研究手段,针对计算病理学领域中“组织病理图像的视觉表征提取”和“弱标注组织病理全视野数字切片的分析”两个基础性关键问题,进行了系统深入的研究,主要研究内容及创新成果包括: 1. 针对数据有类别标注信息的组织病理图像视觉表征提取问题,提出了一种基于深度度量学习的表示学习方法DML-MA。该方法不仅能同时表征组织病理图像的外观和医学含义,还能给出合理可靠的定量图像相似性度量,可以直接用于组织病理图像检索任务。 2. 针对数据无标注信息的组织病理图像视觉表征提取问题,提出了一种基于自监督学习的表示学习方法CS-CO。该方法利用组织病理学领域的先验知识,设计了染色交叉预测和对比学习两阶段的自监督任务训练深度神经网络,能够兼顾图像中的低级通用特征和高级语义特征。使用CS-CO方法提取的组织病理图像视觉表征能有效用于组织分类、癌症预后、肿瘤分型等多种计算病理学任务。 3. 针对弱标注组织病理全视野数字切片分析问题,提出了一种基于多尺度自注意力机制的弱监督学习方法PTMIL。该方法使用大视野图像作为基本分析单元,利用尺度内/尺度间自注意力同时建模组织形态特征的空间相关性和尺度间相关性。PTMIL方法不仅具有与病理医生相当的神经胶质瘤诊断和分型能力,还能从数字切片中预测IDH基因突变、MGMT启动子甲基化等分子水平的信息。 本论文遵循“数据驱动-知识引导-算法赋能”的研究思想,对计算病理学领域的基础性关键科学问题进行了创新性探索,相关研究成果构成了一套通用且有效的组织病理图像分析的深度学习方法框架。该方法框架降低了对标注数据的要求,提高了对病理图像数据的利用效率,既在计算病理学领域具有重要的理论意义,又在组织病理临床实践中具有重大的应用价值。
The rapid development of information technology has promoted the transformation of traditional pathology to digital pathology and given birth to the emerging field of computational pathology. Computational pathology advocates the use of computational approaches to analyze histopathological images and study pathology-related issues. It aims to assist pathologists to improve the efficiency and objectivity of histopathological examination and alleviate pain points in the clinical practice of histopathology. However, the fine labeling of histopathological images is very difficult, and the size of the whole slide image is huge. These problems pose considerable challenges to the design and implementation of computational histopathology approaches. For this reason, this thesis takes the analysis of histopathological images as the focus and adopts deep learning as main approachs. Systematic and in-depth studies have been carried out on two fundamental and crucial issues, which are the visual representations extraction of histopathological image and the analysis of weakly annotated whole slide image. The major research contents and innovations are as follows. Firstly, for extracting visual representations from histopathological images with category annotations, a representation learning method DML-MA based on deep metric learning is proposed. DML-MA can not only characterize the appearance and medical meaning of histopathological image simultaneously, but also give a reasonable and reliable quantitative image similarity metric. Therefore, it can be directly used for histopathological image retrieval. Secondly, for extracting visual representations from unlabeled histopathological images, a representation learning method CS-CO based on self-supervised learning is proposed. CS-CO takes advantages of prior knowledge of histopathology and employs a two-stage self-supervised task for deep neural network training. The self-supervised task consists of cross-stain prediction and contrastive learning, which can guide the model to capture both low-level general features and high-level semantic features from images. Visual representations of histopathological images extracted by CS-CO can be effectively used for various computational pathology tasks such as tissue classification, cancer prognosis, and tumor subtyping. Thirdly, for the analysis of weakly annotated whole slide images, a weakly-supervised learning method PTMIL based on a multi-scale self-attention mechanism is proposed. PTMIL uses large field-of-view images as the basic elements during analysis and leverages intra-/inter-scale self-attention to simultaneously model spatial and inter-scale correlations of tissue morphological features. PTMIL not only has the pathologist-level diagnostic and subtyping capabilities for glioma, but also can predict molecular signatures such as IDH gene mutations and MGMT promoter methylation from whole slide images. Following the research idea of “data-driven, knowledge-guided, algorithm-empowered”, this thesis conducts innovative explorations on the fundamental and crucial issues of computational pathology. The above studies constitute a general and effective deep learning framework for histopathological image analysis. This framework mitigates the requirements for labeled data and improves the utilization efficiency of histopathological image data. It has important theoretical significance in the field of computational pathology and has great application value in clinical practice of histopathology.