作为机器学习的重要分支,深度学习旨在海量数据上训练人工神经网络,使其发现数据内部规律,进而达到人类的智能程度。深度学习具有广泛应用潜力,例如人脸识别安防系统、无人驾驶系统等场景。然而,深度学习算法中安全性与隐私性问题的存在阻碍着其被大规模实际应用。为了增强深度学习模型输出的可信性与可靠性,保护深度学习中的隐私信息不被泄露,推进深度学习理论发展与实际落地,本文针对深度学习算法中的关键问题进行安全性与隐私性分析。本文首先研究自监督学习模型的知识产权保护问题。在传统监督学习范式中,模型的训练性能依赖标签数据,而自监督学习可以通过无标签数据集训练出泛化能力更强的预训练编码器,并逐渐成为深度学习的主流范式。然而,针对自监督学习模型参数的隐私保护问题尚未得到关注。本文以主流自监督学习算法SimCLR、MoCo和BYOL为例,首次证明了自监督预训练编码器易遭受模型窃取攻击。为了有效保护自监督学习模型的知识产权,本文提出首个针对自监督模型的水印嵌入框架SSLGuard,旨在通过秘密向量的嵌入与提取进行模型知识产权的判定。实验结果证明SSLGuard具有强鲁棒性,可抵抗包括水印重写等多种水印擦除攻击。本文讨论的第二个问题是对深度学习模型进行鲁棒性评估。为获取更加安全可靠的评测结果,本文提出首个基于模块化设计的模型鲁棒性测试平台MART。由于深度学习模型在对抗样本集上的鲁棒准确率是衡量模型鲁棒性的关键指标,且攻击性能越强的对抗攻击越有助于探索模型的鲁棒性下界,因此,本文通过MART来探索更强的对抗攻击,旨在输出更可信的鲁棒性评估结果。实验结果表明,和当前基准测试平台AutoAttack相比,MART可进一步降低模型的鲁棒准确率。本文研究的第三个问题是针对推理时间调整算法进行安全性分析。深度学习理论通常假设模型的训练数据和测试数据独立同分布。当模型的输入来自不同分布时,深度学习泛化性差的缺陷会导致模型性能严重降低。推理时间调整算法根据输入数据的分布来对模型参数进行动态调整从而提升模型性能。然而,模型的输入数据可能经过敌手的恶意篡改进而对模型产生安全性威胁。本文以TTT、DUA、TENT和RPL四种主流推理时间调整算法为例,首次证明了推理时间调整算法易遭受投毒攻击。研究结果表明推理时间调整算法存在严重的安全性漏洞,此研究呼吁推理时间调整算法需将抵抗投毒攻击纳入到算法设计之中。
As an important branch of machine learning, deep learning algorithms are designed to train artificial neural networks which have powerful fitting capabilities, enabling them to be used in a variety of real-life scenarios, such as face recognition security systems, self-driving systems, intelligent hospital diagnostic systems, and other security-sensitive scenarios. However, security and privacy issues in deep learning have hindered their use in large-scale practical applications. In order to enhance the trustworthiness and reliability of the deep learning models, protect the privacy information in deep learning from being leaked, and promote the development of deep learning theory and practical applications, this thesis addresses the key issues in deep learning for security and privacy.We first investigate the issue of intellectual property protection for pre-trained encoders based on self-supervised learning. For supervised learning, the training performance of the model relies on labeled data, while self-supervised learning can train pre-trained encoders with strong generalization ability through unlabeled data, thus freeing the reliance on training labels. In this thesis, we demonstrate for the first time that self-supervised pre-trained encoders are vulnerable to model stealing attacks, using the mainstream self-supervised learning algorithms SimCLR, MoCo, and BYOL as examples. In order to effectively protect the intellectual property of self-supervised learning models, we propose SSLGuard, which aims to protect the intellectual property of self-supervised learning models by injecting and extracting the secret vectors. The experimental results demonstrate that SSLGuard is robust against various watermark erasure strategies, including watermark rewriting attacks and model fine-tuning attacks.We then discuss the robustness evaluation of deep learning models. In order to obtain more secure and reliable evaluation results, we propose MART, a model robustness evaluation platform based on a modular design. Since the test accuracy of a deep learning model on an adversarial dataset is a key indicator of model robustness, the stronger the attack performance of the adversarial attack algorithm, the more helpful it is to explore the lower bound of the model robustness. Therefore, we explore stronger adversarial attack algorithms through MART to output more credible robustness evaluation results. The experimental results demonstrate that MART can further reduce the robust accuracy of the model compared to the current benchmark evaluation platform AutoAttack.The third problem investigated in this thesis is the security analysis for test-time adaption methods. Deep learning theory typically assumes that the training and test data are independently and identically distributed. However, when the test samples are from different distributions, the drawback of poor generalization of deep learning models can lead to severe degradation of the performance. The test-time adaption methods dynamically adjust the model parameters based on the input data to improve the model performance. However, the model input data may be maliciously tampered with by an adversary. In this part, we demonstrate for the first time the vulnerability of the test-time adaption methods to poisoning attacks, using four mainstream inference time tuning algorithms, TTT, DUA, TENT, and RPL, as examples. The results show that test-time adaption methods have serious security vulnerabilities, and this study calls for test-time adaption methods to incorporate resistance to poisoning attacks into the algorithm design.