自然语言生成是人工智能领域最重要的研究方向之一。虽然目前以自回归生成方式为主的语言生成技术已获得了较好的发展,但学术界对于更高效的语言建模方式的探索从未停止。近年来,一种新的自然语言生成方式——非自回归生成受到了广泛关注。该方法放弃了从左至右逐词生成的形式,转而利用并行运算来预测生成结果中的每一个词。这一方式可以显著提高文本生成速度,还能根本性地解决自回归生成带来的多种偏差问题。尽管基于非自回归模型的并行生成方式具有以上优点,但目前实践中还存在着生成结果流畅性低,各类文本生成应用上泛用性差的问题。本文针对非自回归模型所面临的挑战,开展了以下四个研究工作:* 非自回归生成模型学习理论及基于代理分布的训练方法:从信息论角度构建了非自回归模型学习理论,揭示了训练中存在的信息损失问题,并指出该损失大小主要与目标分布选择相关。因此,该工作提出了一种基于代理分布的训练方法,有效缓解了信息损失问题。机器翻译上的实验表明,该方法在维持15倍推理加速的同时,取得了生成流畅性的显著提升。* 基于有向无环图建模的非自回归生成模型:在非自回归模型中引入有向无环图结构以建模对于同一输入的多种可能输出,缓解了模型在并行预测中混合多种输出模式的问题,从而提升了生成流畅性。实验表明,该方法在机器翻译的BLEU-4得分中较非自回归模型基线提升3.1,首次取得了与自回归模型相似的翻译质量,并实现了7~14倍推理加速。* 非自回归生成模型在无监督风格迁移任务上的应用:为探索非自回归模型在无监督场景中的应用,该工作针对风格迁移任务,构造无监督目标以训练非自回归模型,并引入词对齐模块抑制无关内容的生成。实验表明,该模型能有效减少自回归生成中存在的臆想问题,实现更快、更好的风格迁移。* 非自回归生成模型的预训练及在通用生成任务上的应用:提出了一种基于预训练的非自回归生成模型,通过在无标注的大规模语料上进行预训练,大幅提升了模型在通用下游任务上的生成质量。实验表明,在多个生成任务的自动评价中,该模型较之前非自回归模型基线平均得分提升4.2,首次超越预训练自回归模型,同时带来了17倍的吞吐量提升。
Natural language generation is a fundamental research direction in the field of artificial intelligence. While autoregressive generation has shown potential in producing high-quality texts, the academic community continues to explore more effective language modeling methods. In recent years, non-autoregressive generation, a new paradigm in natural language generation, has garnered extensive attention. Unlike autoregressive methods, which generate sequences from left to right, non-autoregressive methods use parallel operations to predict all words simultaneously, which leads to a faster generation process and alleviates various biases associated with autoregressive generation.Although parallel generation has the above advantages, current non-autoregressive models suffer from low fluency in generation and poor versatility in applications. To address these challenges, we conduct four research works:* Learning Theory of Non-Autoregressive Generative Models and Training Methods Based on Proxy Distribution: Based on information theory, we develop a theory for the learning of non-autoregressive models. This theory uncovers the problem of information loss during training and highlights that the extent of this loss primarily depends on the target distribution. Hence, this work proposes a novel training method with a proxy distribution, effectively alleviating the problem of information loss. Experimental results on machine translation demonstrate that this method achieves significant improvements in fluency while maintaining a 15 times speedup in inference.* Non-Autoregressive Generative Model with Directed Acyclic Graph: To alleviate the problem of mixing multiple output modes during parallel prediction, we introduce a directed acyclic graph structure into non-autoregressive models. The structure can effectively capture multiple possible outputs for a given input, thereby improving the fluency of generation. Experiments in machine translation demonstrate that this method outperforms non-autoregressive model baselines by 3.1 in BLEU-4 score. Moreover, it is the first non-autoregressive model to attain comparable translation quality to autoregressive models, while delivering 7 to 14 times faster inference speed.* Application of Non-Autoregressive Generative Models in Unsupervised Style Transfer: This work aims to investigate the application of non-autoregressive models in unsupervised scenarios, particularly focusing on style transfer. It proposes an unsupervised objective for training the non-autoregressive model and introduces a word alignment module to prevent the generation of irrelevant content. Experiments show that this model successfully mitigates the problem of hallucination that occurs in autoregressive generation, and achieves improved quality and efficient inference for style transfer.* Pretraining of Non-Autoregressive Generative Model and Its Applications in General-Purpose Generation Tasks: This work proposes a non-autoregressive model pre-trained on a large unlabeled corpus, which significantly improves the generation quality on general-purpose downstream tasks. Experiments reveal that this model outperforms previous non-autoregressive baselines in multiple generation tasks, with an average score improvement of 4.2. Moreover, it is the first time that a non-autoregressive model surpasses pre-trained autoregressive models in general-purpose generation tasks. It also offers a 17 times speedup in throughput.