最近的研究表明,在使用第三方资源(如训练样本或骨干网络)训练深度神经网络时,模型会面临后门威胁。被植入后门的模型在预测良性样本时表现良好,但在面对带有预定义的触发器的中毒样本时,其后门将被激活,攻击者就可以恶意操控其预测结果。大多数现有的后门攻击是针对图像分类任务的,在物体检测任务上还鲜有应用。本文揭示了后门攻击也可能发生在物体检测中,对许多关键应用(例如行人检测和自动驾驶)构成潜在威胁。本文从针对物体检测的后门攻击方法和基于后门水印的物体检测数据集版权保护两方面进行探究,主要的研究内容和贡献包括:(1)针对物体检测的后门攻击方法本文提出了针对物体检测的后门攻击范式。根据物体检测的任务特性,本文设计了一套基于投毒式后门攻击的流程框架。它在不影响受害模型对良性样本的检测效果的前提下,对训练样本和模型嵌入后门,引导受害模型输出攻击者指定的错误检测结果。在此框架下,根据对训练集样本图片的改动种类来划分,提出了可见后门攻击与隐形后门攻击两种攻击形式。在可见后门攻击部分,提出了四种不同的攻击方法,通过大量的实验验证了所提方法在数字空间和物理空间的有效性。在隐形后门攻击部分,提出了用几乎不可见的扰动来代替直接添加在训练集中的触发器的思路,利用基于梯度对齐的两阶段优化算法来得到这些扰动。这种攻击是极其隐蔽的,因为这种扰动修改的像素值相对于原图像来说是非常小的。隐形后门攻击的实验结果也说明了它的有效性。(2)基于后门水印的物体检测数据集版权保护方法本文从物体检测后门攻击的应用角度出发,探索了基于后门水印的物体检测数据集版权保护问题。本文首先分析了加密、差分隐私等其他方法不适用于物体检测数据集版权保护的原因,然后提出了利用后门水印来进行版权验证的思路。接着,本文通过实验展示了这种方法的有效性,证明后门水印方法是可以用来进行版权认证,进而保护物体检测数据集的版权的。本文还进行了大量的消融实验,探讨了如后门水印方法、触发器类型、水印样本数量等超参数对验证结果的影响,增强了验证的鲁棒性与可信度。
The latest research indicates that when training deep neural networks using third-party resources (such as training samples or backbone networks), the models are exposed to backdoor threats. Models implanted with backdoors perform well in predicting benign samples, but when faced with poisoned samples containing predefined triggers, the backdoors will be activated, allowing attackers to maliciously manipulate the prediction results. Most existing backdoor attacks are targeted at image classification tasks, with few applications in object detection tasks. This article reveals that backdoor attacks can also occur in object detection, posing potential threats to many critical applications such as pedestrian detection and autonomous driving. This article explores both the methods of backdoor attacks targeting object detection and the copyright protection of object detection datasets based on backdoor watermarking. The main research contents and contributions include:(1) Backdoor Attack Methods on Object DetectionThis paper proposes a backdoor attack paradigm specifically designed for object detection. Drawing on the task characteristics of object detection, we design a framework for poisoning-based backdoor attacks. This framework embeds backdoors into training samples and models without affecting the victim model's detection performance on benign samples. It guides the victim model to output attacker-specified incorrect detection results. Within this framework, we categorize attacks based on the type of modifications made to training set samples, proposing both visible and invisible backdoor attacks. For visible backdoor attacks, we present four different attack methods and verify their effectiveness in both digital and physical spaces through extensive experiments. For invisible backdoor attacks, we propose using nearly invisible perturbations to replace directly added triggers in the training set. We employ a two-stage optimization algorithm based on gradient alignment to obtain these perturbations. This attack is extremely covert as the pixel values modified by these perturbations are very small relative to the original image. Experimental results of invisible backdoor attacks also demonstrate its effectiveness.(2) A method for copyright protection of object detection datasets based on backdoor watermarksFrom the perspective of applying backdoor attacks to object detection, this paper explores the issue of copyright protection for object detection datasets using backdoor watermarks. We first analyze the reasons why other methods such as encryption and differential privacy are not suitable for copyright protection of object detection datasets. Then, we propose the idea of using backdoor watermarks for copyright verification. Through experiments, we demonstrate the effectiveness of this method, showing that the backdoor watermark method can be used for copyright authentication and thus protect the copyright of object detection datasets. We also conduct extensive ablation experiments to explore the impact of hyperparameters such as the backdoor watermark method, trigger type, and the number of watermarked samples on the verification results. This helps us find better hyperparameters and enhance the credibility of the verification.