插桩是动态程序分析和模糊测试的基础技术。其基本原理是在程序中插入桩点,在程序运行的过程中执行额外逻辑,从而采集丰富的信息,辅助模糊测试进行。当下的插桩技术存在能力差、速度慢、精度低的问题,对模糊测试中的各个阶段都有影响。在编译插桩阶段,插桩需求无法随模糊测试的进展而动态调整,带来了插桩正确性下降和执行速度降低的问题;在程序执行阶段,插桩的额外开销巨大,不光降低了待测程序的执行速度,还影响了测试工具的处理效率;在反馈收集阶段,插桩的反馈信息在大型项目上精度不足,难以为测试工具提供有效的引导信息。为了提升插桩能力,降低执行成本,提升信息质量,本文围绕插桩方案的按需调整、桩点指令的减量增效、反馈信息的精准收集三个方面展开研究,实现了对应的工具,并进行了实际应用。本文的主要内容包括:1. 提出基于在线重编译的按需插桩策略。该策略关注模糊测试运行前的编译插桩阶段,针对插桩需求多变的挑战,引入在线重编译技术,实现按需插桩。2. 提出基于指令集特性的高效插桩技术。该策略关注模糊测试运行中的程序执行阶段,针对桩点逻辑复杂的挑战,充分发挥处理器指令集的能力,从而加速覆盖率场景下的反馈收集和分析。3. 提出基于全程序分析的精准插桩方法。该方法关注模糊测试运行后的反馈收集阶段,针对大型程序逻辑复杂、反馈信息爆炸的挑战,利用全程序分析技术,实现多进程、多二进制场景的精准插桩和覆盖率收集。基于上述研究成果,本文设计并实现了对应的插桩框架和模糊测试工具。在按需插桩策略下,相比当下最先进的静态和动态插桩框架,本研究的插桩性能分别提升了 3× 和 17× ;依赖指令集特性,覆盖率收集速度提升了 23×,覆盖率处理速度提升了 6×;使用全程序分析,本研究不光删减了 50% 的桩点,且在分布式系统为代表的大型项目上提升了测试效果,整体覆盖率平均提升 1.7×。
Instrumentation is fundamental to dynamic analysis and fuzzing. The basic idea is to insert probes to the program. With additional probe logic, rich information can be collected during program execution to aid fuzzing. However, existing instrumentation works suffer from poor capability, slow speed and impaired accuracy, which affects all fuzzing stages. When compiling the program for instrumentation, the instruction requirements cannot be dynamically adjusted during fuzzing; as a result, developers comprise instrumentation quality for execution speed. When executing the program, the huge overhead of instrumentation reduces the target program鈥檚 execution speed and the fuzzer鈥檚 processing speed. When collecting the feedback, the data quality is especially low on large projects, affecting the guidance efficiency for fuzzers.In this paper, we enhance the instrumentation capability, reduce the execution overhead and improve the feedback quality from three aspects, i.e. on-demand adjustment of instrumentation scheme, probes pruning and acceleration, and accurate collection of feedback. We further implement the techniques and apply them empirically. In summary, this paper makes the following contributions:1. We propose an on-demand instrumentation strategy based on online recompilation. Focusing on the pre-fuzzing compilation and instrumentation stage, we introduce an online recompilation technique to adapt to varying instrumentation requirements and achieve on-demand instrumentation.2. We propose an efficient instrumentation technique based on the characteristics of instruction set architecture. Focusing on the program execution stage during fuzzing, we tackle the challenge of complex probe logic to fully utilize the hardware features, resulting in accelerated coverage collection and analysis.3. We propose a precise instrumentation method based on whole-program analysis. Focusing on the feedback collection stage after the program completes execution, we leverage the whole-program analysis technique to achieve accurate instrumentation and coverage collection for multi-process scenarios with multiple binaries. The technique is especially suitable for handling complex logic and feedback explosion on large programs.Based on the above research, we design and implement an instrumentation framework and a fuzzer. With the on-demand instrumentation strategy, the instrumentation performance of this study is improved by 3脳 and 17脳 compared with the state-of-the-art static- and dynamic-instrumentation frameworks, respectively. Based on the instruction set architecture, the coverage collection and processing performance is improved by 23脳 and 6脳, respectively. With whole-program analysis, besides pruning 50% of the probes, the fuzzing on large programs such as distributed systems are improved by 1.7脳 regarding the code coverage.