近年来,高级持续威胁(APT)持续动荡着全球的网络安全态势。随着国际上披露的APT攻击案例不断增加,如何设计入侵检测算法以精准检测APT已成为安全从业者关注的重点。传统的入侵检测方法难以分析潜伏期长、持续性久的攻击行为,从而在检测APT上面临难题。为了更有效地应对APT的威胁,研究者们试图探索和设计新的数据源。溯源图维护了一张包含系统执行历史的有向图,表示了实体之间的控制流和数据流,使得检测系统能够在更长的尺度上挖掘系统内的潜在攻击。近年来,面向溯源图的APT检测方法已成为网络安全领域研究的热点。本文提出了一种基于变分图自编码器的溯源图APT检测算法,解决了现有研究依赖于溯源图完备性和数据可靠性而难以应用的问题。本文的主要创新点如下:(1)提出了一种新的针对APT检测的变分图自编码器,创新性地提出了通过学习溯源图中边的存在性来区分正常行为和攻击行为的方法。具体来说,该方法从给定的溯源图中学习一个满足攻击节点在图中被孤立表示的图模型。其中,编码器学习一个预测图中每条边是否存在的潜在变量,而解码器则根据该预测重建图结构并解码输入,从而根据重建后的图结构识别出异常点和异常边。该模型在检测阶段只基于编码器的输出来检测攻击链路,从而可以实现快速检测。(2)针对现有研究依赖于溯源图完备性,而真实场景中由于系统调用缺失、数据压缩损失和日志信息丢失等因素导致溯源图并不完备的问题,本文提出了通过变分图自编码器学习溯源图数据分布的方法,通过分布参数的自适应性,降低了少量数据缺失对模型学习的影响。(3)针对现有研究依赖于数据可靠性,而真实场景中由于有标数据获取困难、标记不准确、APT数据投毒等因素导致数据并不可靠的问题,本文提出了一种半监督训练的方法,充分结合了少量有标数据和海量无标数据来训练模型,使得模型具备从无标数据中学习出数据标签的能力。(4)在三个APT漏洞攻击场景和一个现实APT演习场景上,将本文方法和现有面向溯源图的APT检测方法与基于图注意力网络自编码器的异常检测方法进行对比,以验证本文方法检测APT的能力。同时基于上述场景,分别设计了多组对照实验,比较在溯源图完备性和数据可靠性被违反时现有方法和本文方法的检测效果。实验结果证明了本文方法在APT检测以及两类问题应对上的有效性。
In recent years, Advanced Persistent Threats (APTs) have continued to disrupt the global cybersecurity landscape. With the increasing number of disclosed APT attack cases internationally, designing intrusion detection algorithms for precise APT detection has become a focal point for security professionals. Traditional intrusion detection methods struggle to analyze attacks with long latency and persistence, posing challenges in APT detection. To better address the threat of APTs, researchers are exploring and designing new data sources. The concept of provenance graphs maintains a directed graph containing the history of system executions, depicting control and data flow between entities, enabling detection systems to explore potential attacks within the system at a longer scale. In recent years, APT detection methods based on provenance graphs have become a hot topic in the field of cybersecurity.One innovative approach proposed in this study is a provenance graph APT detection algorithm based on variational graph autoencoders, addressing the issue of existing research relying on the completeness and reliability of provenance graphs, making them difficult to apply. The key innovations of this study are as follows:(1) Introduction of a novel variational graph autoencoder tailored for APT detection, innovatively proposing a method to differentiate normal behavior from attack behavior by learning the existence of edges within the provenance graph. Specifically, this method learns a graph model that isolates attack nodes in the graph from the given provenance graph. The encoder learns latent variables predicting the existence of each edge in the graph, while the decoder reconstructs the graph structure based on this prediction and decodes the input, thereby identifying anomalous points and edges based on the reconstructed graph structure. This model detects attack paths based solely on the output of the encoder during the detection phase, enabling rapid detection.(2) Addressing the reliance of existing research on provenance graph completeness, a common issue in real-world scenarios due to factors such as missing system calls, data compression losses, and lost log information, the study proposes a method to learn the distribution of provenance graph data using variational graph autoencoders. By adapting the distribution parameters, the impact of minor data omissions on model learning is reduced.(3) Tackling the dependency of existing research on data reliability, another common issue in real-world scenarios where factors like difficulty in obtaining labeled data, inaccurate labeling, APT data poisoning, etc., lead to unreliable data, the study introduces a semi-supervised training method. This method effectively combines a small amount of labeled data with a large amount of unlabeled data to train the model, enabling the model to learn data labels from unlabeled data.(4) The study evaluates the effectiveness of the proposed method in detecting APTs in three APT vulnerability attack scenarios and one real-world APT exercise scenario. A comparative analysis is conducted between the proposed approach and existing provenance graph-based APT detection methods and anomaly detection methods based on graph attention network autoencoders to verify the capability of the proposed method in detecting APTs. Various controlled experiments are designed based on the aforementioned scenarios to compare the detection performance of existing methods and the proposed method when provenance graph completeness and data reliability are compromised. Experimental results substantiate the effectiveness of the proposed method in APT detection and addressing two types of issues.