以深度学习为代表的人工智能技术在计算机视觉、语音识别等众多领域均取得了显著进展,规模化应用已现曙光。但现有深度学习模型存在鲁棒性不足的问题,很容易被攻击者恶意构造的对抗样本欺骗,产生错误的预测结果。深度学习鲁棒性的不足已被证实会对一些与安全密切相关的领域带来威胁。同时,这一问题也阻碍了深度学习的进一步发展。对抗攻击与鲁棒性测评作为深度学习鲁棒性研究中的重要方向,旨在面向不同场景高效地生成对抗样本并针对深度学习模型的鲁棒性进行全面的测评。此方面研究有助于发现深度学习模型的脆弱性,比较不同模型的鲁棒性,以及发展更加鲁棒的深度学习模型。对抗攻击与鲁棒性测评方面的研究仍然存在一些亟待解决的问题。第一,现有对抗攻击方法在无法获取模型结构和参数信息的黑盒场景下攻击成功率与效率较低,阻碍了模型脆弱性机理的分析。第二,现有对抗攻击方法生成对抗样本的多样性不足,限制了基于这些对抗样本训练所得模型的鲁棒性。第三,目前对抗鲁棒性测评的研究工作较为欠缺,研究者难以有效评估不同深度学习模型的鲁棒性以及对抗攻防算法的有效性。为解决上述关键问题,本文构建对抗攻防测评基准与平台,并面向不同场景研发高效对抗攻击算法。主要创新点概括如下:1. 针对黑盒迁移攻击成功率较低的问题,提出动量迭代法与平移不变对抗攻击方法,分别通过引入动量项以及对一组经过平移变换的图片生成对抗样本,大幅提高黑盒迁移攻击成功率,为理解深度学习模型的脆弱性机理及发现模型的安全漏洞奠定了理论和方法基础。2. 针对黑盒决策攻击效率较低的问题,面向人脸识别场景,提出进化攻击方法,通过建模搜索方向的局部几何结构和降低搜索空间的维度有效提升黑盒决策攻击的效率,为挖掘人脸识别模型的安全漏洞奠定了理论和方法基础。3. 针对对抗训练模型鲁棒性不足的问题,提出对抗分布训练,利用对抗分布刻画原始样本周围多样化的对抗样本,并通过三种对抗攻击方式参数化建模对抗分布,为构建更加鲁棒的深度学习模型奠定了理论和方法基础。4. 针对对抗鲁棒性测评较为欠缺的问题,面向图像分类任务构建对抗鲁棒性测评基准,采用鲁棒性曲线针对多个典型的对抗攻防算法进行公平、全面的鲁棒性测评,为今后对抗攻防模型及算法的开发奠定了测评基础。
Artificial intelligence technologies, especially deep learning, have made significant progress in numerous fields such as computer vision and speech recognition, while large-scale applications are now dawning. However, the existing deep learning models have the problem of insufficient robustness that they can be easily deceived by the adversarial examples maliciously generated by adversaries to produce wrong predictions. The lack of robustness of deep learning has been proven to pose threats to some areas closely related to security. Meanwhile, this problem hinders the further development of deep learning. Adversarial attacks and robustness evaluation are important directions of the research on deep learning robustness, aiming to efficiently generate adversarial examples under different scenarios and conduct comprehensive robustness evaluation of deep learning models. The research in this area helps to identify the vulnerabilities of deep learning models, compare the robustness of different models, and develop more robust deep learning models. The research on adversarial attacks and robustness evaluation still has some problems that need to be solved urgently. First, the existing adversarial attack methods exhibit low attack success rate and inefficiency under the black-box scenarios where model structure and parameters cannot be obtained, which hinders the analysis of model’s vulnerability mechanism. Second, the diversity of adversarial examples generated by the existing adversarial attack methods is insufficient, which limits the robustness of the models trained on these adversarial examples. Third, the current research on adversarial robustness evaluation is relatively lacking, such that it is difficult for researchers to effectively evaluate the robustness of different deep learning models and the effectiveness of adversarial attack and defense algorithms. To solve the above key problems, this dissertation builds a benchmark and a platform for evaluating adversarial attacks and defenses, and develops efficient adversarial attack algorithms under different scenarios. The main contributions are summarized as follows:1. For the problem of low success rate of black-box transfer-based attacks, a momentum iterative method and a translation-invariant adversarial attack method are proposed. They introduce a momentum term and adopt a set of translated images, respectively, for generating adversarial examples, which greatly improve the success rate of black-box transfer-based attacks. They lay the theoretical and methodological foundation for understanding the vulnerability mechanism of deep learning models and discovering model’s security holes.2. For the problem of inefficiency of black-box decision-based attacks, an evolutionary attack method is proposed for face recognition. It models the local geometry of the search direction and reduces the dimension of the search space in black-box decision-based attacks to effectively improve the efficiency of black-box decision-based attacks, which lays the theoretical and methodological foundation for digging the security holes of face recognition models.3. For the problem of insufficient robustness of adversarial training models, an adversarial distributional training is proposed to use adversarial distributions to characterize the diverse adversarial examples around the original one, which parameterizes the adversarial distributions through three adversarial attacks, laying the theoretical and methodological foundation for building more robust deep learning models.4. For the problem of the lack of adversarial robustness evaluations, an adversarial robustness benchmark is constructed for image classification, which uses robustness curves to conduct fair and comprehensive robustness evaluation of many typical adversarial attack and defense algorithms. It lays the evaluation foundation for further development of adversarial attack and defense algorithms.