登录 EN

添加临时用户

基于联邦学习的医疗图像辅助诊断算法研究

Research on Medical Image Aided Diagnosis Algorithm Based on Federated Learning

作者:孔飞
  • 学号
    2020******
  • 学位
    硕士
  • 电子邮箱
    kf2******.cn
  • 答辩日期
    2023.05.10
  • 导师
    董宇涵
  • 学科名
    生物医学工程
  • 页码
    70
  • 保密级别
    公开
  • 培养单位
    599 国际研究生院
  • 中文关键词
    医疗图像分析, 辅助诊断, 机器学习, 联邦学习, 隐私保护
  • 英文关键词
    Medical Image Analysis, Auxiliary Diagnosis, Machine Learning, Federated Learning, Privacy Protection

摘要

人工智能在医学图像领域的应用效果显著,具有鲁棒性的人工智能模型训练时需要大数据集,但数据收集时面临通信、伦理和隐私保护等方面的限制。联邦学习通过协调多个客户端在不共享原始数据的情况下训练模型,可以很好地解决上述问题。本文先从简单的眼底图像数据入手,验证了联邦学习在医疗图像诊断和分级任务上的性能,并针对多类别分级的数据不均衡问题,创新性地提出了一种基于类别占比权重优化的联邦聚合算法W-FedAvg。此算法缓解了类别不均衡时增加数据集数量对提升模型效果不明显的问题,其泛化性也高于基线模型FedAvg加权平均式的聚合方式。在四个客户端的40532张眼底图像数据集上训练模型后,在一个额外数据集上评估了W-FedAvg在糖尿病视网膜病变诊断任务和分级任务中的有效性。在诊断任务中,4个客户端的平均AUC为77%,W-FedAvg模型的AUC达到87%。在分级任务中,4个客户端的平均Kappa分数为0.75,W-FedAvg模型Kappa分数达到0.83。之后,本文尝试将联邦学习应用在复杂的组织病理图像诊断和分级任务上,并针对大尺度病理图像和联邦学习异质性挑战设计了一种联邦对比学习算法FCL(Federated Contrastive Learning)。FCL通过最大化本地客户端模型和服务器模型之间的注意力一致性,增强了模型的泛化能力。为了缓解权重传递时的隐私泄露问题并验证FCL的鲁棒性,本文又使用差分隐私添加噪声进一步保护模型。在多个客户端的19635张前列腺癌症组织病理图像上,评估了FCL在癌症诊断任务和Gleason分级任务中的有效性。在诊断任务中,类别相对均衡时7个客户端的平均AUC为95%,FCL模型的AUC达到97%。在Gleason分级任务中6个客户端的平均Kappa为0.74,FCL模型Kappa达到0.84。此外,本文在额外的测试集(一个公开数据集和两个私有测试集)上也验证了模型的泛化性。本文设计的联邦学习模型为医疗图像分析带来了一种鲁棒、精准和低成本的人工智能训练模式,并且有效地保护了医疗数据的隐私。

The application of artificial intelligence in the field of medical imaging has achieved remarkable results. Robust artificial intelligence model training requires large data sets, but data collection faces limitations in communication, ethics, and privacy protection. Federated learning can well solve the above problems by coordinating multiple clients to train models without sharing original data.This thesis starts with simple fundus image data, verifies the performance of federated learning on medical image diagnosis and classification tasks, and innovatively proposes a W-FedAvg, a federated aggregation algorithm based on category weight optimization. This algorithm alleviates the problem that increasing the number of data sets has no obvious effect on improving the model when the categories are unbalanced, and its generalization is also higher than the aggregation method of the baseline model FedAvg weighted average. After training the model on a dataset of 40,532 fundus images from four clients, the effectiveness of W-FedAvg on a diabetic retinopathy diagnosis task and a grading task was evaluated on an additional dataset. In the diagnosis task, the average AUC of 4 clients is 77%, and the AUC of W-FedAvg model reaches 87%. In the classification task, the average Kappa score of 4 clients is 0.75, and the Kappa score of W-FedAvg model reaches 0.83.After that, this thesis attempts to apply federated learning to complex histopathological image diagnosis and grading tasks, and designs a federated contrastive learning algorithm FCL for large-scale pathological images and federated learning heterogeneity challenges. FCL enhances the generalization ability of the model by maximizing the attention consistency between the local client model and the server model. In order to alleviate the privacy leakage problem during weight transfer and verify the robustness of FCL, this thesis uses differential privacy to add noise to further protect the model. On 19635 histopathological images of prostate cancer from multiple clients, the effectiveness of FCL in cancer diagnosis task and Gleason grading task is evaluated. In the diagnosis task, the average AUC of 7 clients is 95% when the category is relatively balanced, and the AUC of the FCL model reaches 97%. In the Gleason classification task, the average Kappa of 6 clients is 0.74, and the FCL model Kappa reaches 0.84. In addition, this thesis also verifies the generalization of the model on additional test sets (one public dataset and two private test sets).The federated learning models designed in this thesis bring a robust, accurate and low-cost AI training mode to medical image analysis, and effectively protect the privacy of medical data.