登录 EN

添加临时用户

基于溯源图融合的精确依赖分析与攻击溯源框架研究

Research of A Provenance Graph Fusion-based Dependency Analysis Framework for Attack Investigation

作者:王瑞华
  • 学号
    2020******
  • 学位
    硕士
  • 电子邮箱
    412******com
  • 答辩日期
    2023.05.21
  • 导师
    万海
  • 学科名
    软件工程
  • 页码
    70
  • 保密级别
    公开
  • 培养单位
    410 软件学院
  • 中文关键词
    APT攻击,依赖爆炸问题,融合溯源图,日志关联分析,攻击溯源
  • 英文关键词
    Advanced Persistent Threat,Dependency Explosion Problem,Hybrid Holistic Provenance Graph,Log Correlation Analysis,Attack Investigation

摘要

APT攻击(又称高级持续性威胁)是一种针对国家、企业或大型组织的新型攻击方式,它具有潜伏时间长、攻击动作隐蔽、杀伤力强大等特点。APT攻击的兴起给传统的基于溯源图的攻击溯源方法带来了极大的挑战:在遇到长时间运行的应用程序时,传统的溯源图中会产生入边和出边数量庞大的节点,这类节点会引入大量错误的依赖关系,从而导致著名的“依赖爆炸”问题。 依赖爆炸问题是APT场景下攻击溯源的核心问题。近年来,基于日志融合的非插桩溯源算法成为缓解依赖爆炸问题的主流思路,但目前的各类方案均存在相应的不足:基于二进制分析的解决方案无法适应工业场景中软件的更新迭代速度;基于简单图合并规则的解决方案只能适配有限的特定场景;基于大量硬编码规则的方法则在攻击场景变化时不具备可扩展性。 本文提出了一个非插桩的、不依赖二进制分析的、不依赖硬编码规则的通用溯源模型,以解决APT场景下攻击溯源中的依赖爆炸问题。本文将这一溯源模型称为基于溯源图融合的精确依赖分析与攻击溯源框架,它分为以下两个部分:(1)本文提出了一套基于日志关联分析的溯源图融合算法,这一算法可以将不同日志数据源产生的单源溯源图合并成为一张包含了丰富语义信息的融合溯源图。本文首先设计了一套高效率的日志关联算法,该算法能够以线性时间复杂度生成日志关联图;随后本文基于日志关联图建立单源溯源图之间的事件关联,从而构建融合溯源图。(2)本文提出了一套基于融合溯源图的短路式攻击溯源算法,这一算法可以充分利用融合溯源图的结构特点,在溯源时通过图搜索技术寻找“短路”以绕过依赖爆炸节点,从而精确地构建攻击故事。这一算法是对传统攻击溯源算法的改良,它包含两个主要机制:短路的搜索机制和短路的评分机制,它们能在溯源算法处理依赖爆炸节点时选择出合适的短路以避免误报的产生。 本文在18个数据集上进行了4组实验以评估算法效果。这些数据集涵盖了大量的APT场景与APT攻击方式。实验结果表明:本文提出的基于日志关联分析的溯源图融合算法能够正确、有效地构建融合溯源图;基于融合溯源图的短路式攻击溯源算法能够有效缓解依赖爆炸问题、精确地构建攻击故事;本文的算法能够高效处理APT场景下的海量数据,快速地完成工业场景下的APT攻击溯源任务。

Advanced Persistent Threat (APT) has recently become one of the most critical cyberspace threats to countries, enterprises, and institutions. APT attacks often last a long period of time, using covert tricks to cause greater damage. Traditional provenance graph-based forensics analysis methods might face serious problems when dealing with long-running applications in APT scenarios, which have a lot of input and output operations during their lifetime. Nodes with a large number of incoming and outgoing edges will be generated, which would hinder the backtracking analysis since all of their outgoing edges are assumed to depend on all of their incoming edges. This is the so-called dependency explosion problem.The dependency explosion problem is the core problem of attack investigation in APT scenarios. Recently, several log fusion-based methods have been proposed to leverage different types of logs to solve this problem, achieving enhanced performance without instrumentation. However, these fusion mechanisms are implemented requiring either a set of sophisticated fusion rules, lacking scalability, or simple correlation rules, only applied to limited scenarios, or binary reverse engineering, requiring updating once the binary changes.The goal of this paper is to find a general log fusion-based approach to solve the dependency explosion problem. It should not rely on instrumentation, binary analysis, or handcrafted rules. This paper proposes a provenance graph fusion-based framework for dependency analysis and attack investigation, which consists of two phases. (1) In the first phase, the framework constructs individual provenance graphs from different logs. Then it uses log correlation analysis to find the correlation among the edges of those individual provenance graphs. The analysis algorithm generates a correlated log graph (CLG) in linear complexity. Based on the CLG, the individual provenance graphs are stitched together to form a hybrid holistic provenance graph (HHPG). (2) In the second phase, starting from the symptom event in the HHPG, the framework improves traditional forward/backward analysis algorithms to perform attack investigation. If the algorithm meets a dependency explosion node, it can select the correct outgoing/incoming edge with the help of a proper shortcut. A shortcut discovering algorithm and a shortcut ranking algorithm are proposed to find the proper shortcut. The algorithm then can bypass the dependency explosion node using the shortcut and continue the attack investigation process.In order to validate each step and the overall performance of this attack investigation framework, this paper conducts comprehensive tests with 4 sets of experiments on 18 data sets. These data sets have covered a large number of APT scenarios and APT attack methods. The experimental results confirm the completeness of correlation among different provenance graphs and the existence of shadow path pair around dependency explosion nodes and show that the framework can rebuild the attack story with good interpretability correctly and efficiently.