登录 EN

添加临时用户

统计隐蔽性增强的文本生成式隐写方法与应用

Method and Application of Generative Linguistic Steganography with Enhanced Statistical-Imperceptibility

作者:张思雨
  • 学号
    2019******
  • 学位
    硕士
  • 电子邮箱
    zha******.cn
  • 答辩日期
    2022.05.17
  • 导师
    黄永峰
  • 学科名
    电子与通信工程
  • 页码
    103
  • 保密级别
    公开
  • 培养单位
    023 电子系
  • 中文关键词
    隐写术,统计隐蔽性,文本隐写,生成式隐写,社交媒体
  • 英文关键词
    Steganography, Statistical-Imperceptibility, Linguistic Steganography, Generative Steganography, Social Media

摘要

隐写术的研究对维护国家安全和社会稳定具有重要意义。现有文本生成式隐写方法大多基于“语言模型 + 单词隐写编码算法”框架,能够生成具有高感知隐蔽性的隐写文本。然而Psic效应表明在该框架下感知隐蔽性与统计隐蔽性并非呈正比关系,现有方法倾向于生成高概率单词,使得单词隐写编码算法对文本生成的约束较强,导致统计隐蔽性较差。本文从弱化隐写编码对文本生成的约束、解除隐写编码与单词选择的耦合两种技术路径出发,提出了三种统计隐蔽性增强的文本生成式隐写方法,并应用于面向社交媒体的隐蔽通信系统中。具体的研究内容与贡献总结如下: (1)在“语言模型 + 单词隐写编码算法”的框架下,弱化隐写编码对文本生成的约束,提出了基于自适应动态分组(Adaptive Dynamic Grouping,ADG)的文本生成式隐写方法。ADG根据单词条件概率分布将词表自适应地划分为等概率的若干个分组,并递归划分至无法再分,令分组与秘密信息建立一一映射。理论分析和实验结果均表明ADG具有较强的统计隐蔽性。 (2)将秘密信息嵌入到不同的语义主题而非不同的单词中,实现隐写编码与文本生成过程中具体单词选择的解耦,提出了基于句子语义编码的文本生成式隐写方法。发送方通过可控文本生成模型生成特定主题的隐写文本,接收方通过语义分类器恢复文本语义,采用拒绝采样的生成策略确保正确提取。用CTRL和BERT模型实现该方法并进行实验,结果显示所提方法具有高统计隐蔽性。 (3)将秘密信息以字符的形式分散隐藏在隐写文本中,实现隐写编码与单词选择的解耦,提出了面向字符秘密信息的文本生成式隐写方法Pos-Stega。Pos-Stega通过基于Gumbel-Softmax耦合机制的前向-后向语言模型控制秘密信息字符在隐写文本中的位置,与接收方共享位置序列实现秘密信息的提取。实验结果表明,所提语言模型优于基线模型,所提隐写方法Pos-Stega具有较强的统计隐蔽性。 (4)基于所提方法研发面向社交媒体的隐蔽通信系统的隐写功能模块。针对在手机移动端本地进行隐写文本生成和秘密信息提取的开发需求,首先转换神经网络语言模型及相关函数,然后在移动端实现模型的调用和推理,进而实现多种隐写方法的嵌入提取功能。在华为Mate 20手机终端上的测试结果表明本文所提方法能够成功实现且高效运行。

The research of steganography is of great significance to maintain national security and social stability. Most of the existing generative linguistic steganographic methods are based on the framework of “language model + word-level steganographic coding algorithm”, which can generate steganographic text with high perceptual-imperceptibility. However, the Psic Effect reveals that the perceptual-imperceptibility and statistical-imperceptibility are not in positive proportion under this framework. The existing methods tend to generate words with high probability, which makes the word-level steganographic coding algorithm have strong constraints on text generation, resulting in poorstatistical-imperceptibility. In this paper, we propose three generative linguistic steganographic methods with enhanced statistical-imperceptibility based on two technical approaches: weakening the constraint of steganography on text generation and releasing the coupling between steganographic coding and word selection, and apply them to social media-oriented covert communication systems. Our contributions are summarized as follows: (1) Based on the framework of “language model + word-level steganographic coding algorithm”, we propose a generative linguistic steganographic method based on Adaptive Dynamic Grouping (ADG) to reduce the constraint of steganographic coding on text generation. ADG adaptively divides the vocabulary into several groups of equal probability according to the conditional probability distribution of the next token, and recursively divides it to the point that it can no longer be divided, so as to establish a one-to-one mapping between the groups and the secret messages. Both theoretical analysis and experimental results show that ADG has strong statistical-imperceptibility. (2) By embedding secret information into different semantic topics instead of different words, steganographic coding is decoupled from specific word selection in the process of text generation, and a generative linguistic steganographic method based on sentence semantic coding is proposed. The sender generates steganographic text based on a specific topic through a controllable text generation model, and the receiver restores the text semantics through a semantic classifier. We also adopts the generation strategy based on rejecting sampling to ensure correct extraction. Experiments with CTRL and BERT model reveal that the proposed method has high statistical-imperceptibility. (3) To dispersedly hide secret messages in stegotext in the form of characters to realize the decoupling of steganographic coding and word selection, we propose Pos-Stega, a generative linguistic steganographic method for character-form secret messages. Pos-Stega uses a backward and forward language model (BFLM) based on Gumbel-Softmax coupling mechanism to control the position of secret information characters in stegotext. The position sequence is shared with the receiver for extraction. Experimental results indicate that the proposed BFLM outperforms the baseline model, and the proposed steganographic method Pos-Stega has high statistical-imperceptibility.(4) Based on the proposed methods, we developed the steganography module of social media-oriented covert communication system. In order to meet the requirements of local stegotext generation and secret messages extraction on mobile terminal, the neural network language models are firstly transformed, and then loaded on mobile terminal for inference. The embedding and extraction functions of multiple steganographic methods are further realized. The test results on Huawei Mate 20 mobile terminal show that the proposed method can run successfully and efficiently.