登录 EN

添加临时用户

大规模预训练语言模型的高效适配技术研究

Effective Adaptation of Large-scale Pre-trained Language Models

作者:丁宁
  • 学号
    2018******
  • 学位
    博士
  • 电子邮箱
    din******.cn
  • 答辩日期
    2023.05.17
  • 导师
    郑海涛
  • 学科名
    计算机科学与技术
  • 页码
    163
  • 保密级别
    公开
  • 培养单位
    024 计算机系
  • 中文关键词
    预训练模型, 下游适配, 数据高效, 计算高效
  • 英文关键词
    Pre-trained Language Models, Adaptation, Data-wise Effectiveness, Computational Efficiency

摘要

近年来,大规模预训练语言模型逐渐成为了现代自然语言处理的基本范式。这类模型首先在大规模的无标注语料中进行自监督地预训练,然后以预训练后的模型为初始点逐一适配到各类下游任务中。早期的模型适配至下游任务的方式叫做精调(Fine-tuning),即针对每一个任务去单独地进行优化并调整参数。然而,随着模型规模的不断增大,这类精调方法面临着严峻的挑战。它不仅无法很好地处理数据量不足的情况,还伴随着巨大的计算和存储消耗。本文从数据高效和优化高效两个高的层面入手,研究大规模预训练语言模型的高效驱动技术,并且根据相应的技术构建编程系统、数据来促进具体的应用落地。面向数据高效的模型适配,本文以知识获取应用为落脚点,针对存在大量弱监督文本数据的场景,提出具有几何意义的原型学习的方法来学习类别的隐态表征来更好地应对精标注数据不足的情况。针对不存在弱监督文本的场景,进一步提出超球原型学习的表示方法来提升小样本学习的稳定性。对少样本的命名实体识别,本文构建了第一个也是业内规模最大的细粒度数据集和三个具体的识别任务,促进了少样本知识获取的发展。提出利用语言模型在预训练中的固有特性,使用提示学习的方法来完成细粒度知识获取,在少样本和零样本的场景上大幅超越传统的微调算法。面向计算高效的模型适配,本文提出增量学习(Delta Tuning)框架,对大规模语言模型的参数高效适配进行全面地分析、理论讨论和实验探索,在100余个任务上研究了增量学习的实验表现、迁移性、模型规模影响、组合性、泛化误差和计算效率。同时将二阶优化应用至增量学习框架下的大规模预训练语言模型中,并且提出了牛顿步裁剪的方法来稳定训练。本文同时构建了一套模型数据高效和计算高效适配的开源系统。在数据高效层面,本文构建了统一范式的提示学习系统OpenPrompt,打通了提示学习从数据处理到模板构建再到模型训练的全流程。在优化高效层面,本文构建了统一范式的增量学习系统OpenDelta,实现了不修改任何模型源代码的张量流转换,使得增量学习可以适配到任意模型和模型的具体位置。

In recent years, large-scale pre-trained language models have gradually become the basic paradigm of modern natural language processing. These models are first self-supervisedly pre-trained on a large amount of unlabeled text data, and then adapted to various downstream tasks one by one based on the pre-trained model as the starting point. The early method of adapting models to downstream tasks is called fine-tuning, which optimizes and adjusts parameters separately for each task. However, with the continuous increase in model size, this fine-tuning method faces serious challenges. It not only fails to handle situations where data is insufficient, but also comes with huge computational and storage costs. This article starts from two aspects of data efficiency and optimization efficiency, studies the efficient driving techniques of large-scale pre-trained language models, and builds programming systems and data based on corresponding techniques to promote specific applications. In terms of data-efficient model adaptation, this article focuses on knowledge acquisition applications and proposes a prototype learning method with geometric significance to learn the latent representation of categories, which better addresses the issue of insufficient fine-annotated data in scenarios with a large amount of weakly supervised text data. For scenarios where there is no weakly supervised text, a hypersphere prototype learning representation method is further proposed to improve the stability of small-sample learning. For named entity recognition with few samples, this article builds the first and largest fine-grained dataset in the industry and three specific recognition tasks, promoting the development of knowledge acquisition with few samples. It proposes to use the inherent characteristics of language models in pre-training and use prompt learning methods to achieve fine-grained knowledge acquisition, which greatly surpasses traditional fine-tuning algorithms in scenarios with few or no samples. In terms of computationally efficient model adaptation, this article proposes the Delta Tuning framework, which comprehensively analyzes, theoretically discusses, and experimentally explores the efficient parameter adaptation of large-scale language models on more than 100 tasks. It applies second-order optimization to the Delta Tuning framework of large-scale pre-trained language models and proposes a Newton step clipping method to stabilize training. This article also builds an open-source system for efficient model and data adaptation. On the data-efficient level, it builds the OpenPrompt prompt learning system with a unified paradigm, which connects the entire process from data processing to template construction to model training for prompt learning. On the computational efficient level, it builds the OpenDelta incremental learning system with a unified paradigm, which realizes tensor flow conversion without modifying any model source code, enabling incremental learning to be adapted to any model and its specific location.