尽管深度学习技术取得了相当多的突破成绩,深度学习系统在边界情形下却经常表现出异常行为。在安全攸关的场景下,对深度学习系统进行充分的测试保障其安全性和可靠性至关重要。由于深度神经网络模型与传统软件程序存在着巨大的差异,深度学习测试利用传统软件测试技术生成测试输入时往往资源开销较大。分门别类的深度学习系统也使得主流的基于卷积神经网络的测试算法在其他类型的模型比如循环神经网络上表现受限。同时,深度学习系统上的覆盖率指标难以像软件测试覆盖率一样反映测试过程的完备性。 针对以上难点和挑战,本文从覆盖率导向的测试技术、基于卷积神经网络的差分模糊测试技术和基于循环神经网络的状态导向测试技术三个方面展开研究并给出方案,主要内容如下: (1)针对深度学习测试的覆盖率指标无法有效导向生成测试输入的问题,本文设计了一套覆盖率导向的深度学习测试框架,作为贯穿本文可支持主流模型的基础框架。框架包含对抗搜索、覆盖率导向和联合优化三个模块。对抗搜索模块依据不同系统的功能,通过最大化预测差异引发系统发生异常行为。覆盖率导向模块根据不同模型的结构,设计相应的覆盖率指标并提升覆盖率。最后在联合优化模块中通过联合求解前两个模块的优化目标来生成测试输入。 (2)对于深度学习测试技术资源开销较大的问题,本文将模糊测试应用到卷积神经网络的测试中,并提出了第一个覆盖率导向的差分模糊测试框架DLFuzz。DLFuzz利用差分测试来避免人工标注数据集的成本,并克服了收集相似功能的深度学习系统进行交叉验证的困难。DLFuzz也设计了多种神经元筛选方法,来提升测试过程中的神经元覆盖率。与当时最先进的工作DeepXplore相比,DLFuzz在时间消耗更小的情形下可生成338.59%更多的对抗测试输入,并提升神经元覆盖率。 (3)对于循环神经网络测试重要的序列化场景安全隐患大的问题,本文提出了状态导向测试工具RNN-Test。该工具不局限于目前大部分工作关注的分类任务,主要关注于Seq2seq序列化场景。针对循环神经网络的独特结构,RNN-Test设计了全新的对抗搜索算法并提出了两个状态覆盖率指标。与多种测试算法(包括FGSM、DLFuzz、testRNN以及DeepStellar)相比,RNN-Test在多个不同类型的模型上均展现出更强的测试效果,可高效地生成高质量的对抗测试输入。
Although deep learning has made considerable breakthroughs, deep learning systems often exhibit abnormal behaviors in corner cases. In security-critical scenarios, it is very important to systematically test the deep learning systems to ensure their security and reliability. Due to the huge gap between deep neural networks (DNNs) and traditional software programs, it costs massive overhead when deep learning (DL) testing exploits traditional software testing technologies to generate test inputs. Diversified types of deep learning systems also limit the performance of those test algorithms designed for convolutional neural networks (CNNs) on other types of models, such as recurrent neural networks (RNNs). Finally, the coverage criteria defined for DL testing are deficient to guide the testing.Towards the above difficulties and challenges, this paper conducts research over three aspects: coverage guided testing, differential fuzz testing for CNNs, and state-oriented adversarial testing for RNNs. The three parts of this paper are as follows:(1) For the shortage of coverage criteria in DL testing, this paper designs an efficient and lightweight coverage guided deep learning testing framework, which can support mainstream neural network models, including CNNs and RNNs. The framework consists of three modules: adversarial search, coverage guidance, and joint optimization. Based on the specific deep learning systems, the adversarial search module tries to make the models perform incorrectly by maximizing the prediction error. In the coverage guidance module, coverage metrics can be customized according to specific neural networks. Then, the coverage guidance and adversarial search will be combined to search for the adversarial inputs efficiently using the gradient-based methods.(2) For resource overhead of deep learning testing technologies, this paper first applies fuzz testing to CNNs, and proposes the first coverage guided differential fuzz testing framework DLFuzz. DLFuzz uses differential testing to avoid the cost of manually labeling, and overcomes the cross-reference efforts for collecting deep learning systems with similar functionalities. Besides, multiple neuron selection strategies are designed to improve the neuron coverage during testing. Compared with the state-of-the-art work DeepXplore then, it can generate 338.59% more adversarial inputs and improve neuron coverage with less time consumption.(3) For the lack of testing methodologies for RNNs, especially the important seq2seq scenarios, this paper proposes an adversarial testing framework RNN-Test. It focuses on the seq2seq models and applications, not limited to classification tasks of most existing works. According to the unique structure of RNNs, RNN-Test put forward a novel search algorithm and two state-based coverage metrics. Compared with various testing technologies including FGSM, DLFuzz, testRNN, and DeepStellar, RNN-Test exhibits higher effectiveness, improving the success rate of producing adversarial inputs of high quality.