登录 EN

添加临时用户

针对低资源场景的语言和语音处理的深度学习方法及应用

Deep Learning for Language and Speech Processing in Low Resource Scenarios: Methods and Applications

作者:徐进
  • 学号
    2021******
  • 学位
    博士
  • 电子邮箱
    jxu******com
  • 答辩日期
    2023.05.19
  • 导师
    李建
  • 学科名
    计算机科学与技术
  • 页码
    121
  • 保密级别
    公开
  • 培养单位
    047 交叉信息院
  • 中文关键词
    深度学习,低资源,现实场景应用,语音处理,自然语言处理
  • 英文关键词
    Deep Learning,Low Resource,Real-world Applications,Speech Processing,Natural Language Processing

摘要

尽管深度学习模型在语音识别和机器翻译等各种任务中取得了巨大成功,但它们仍然严重依赖于海量的数据、专家设计的高效架构以及用于训练和部署的昂贵计算硬件。 在现实场景中,开发高性能的神经网络是具有挑战性的,因为数据资源、专家资源和计算资源通常是匮乏的。这些现实场景的低资源约束阻碍了深度神经网络在工业领域中的广泛应用。在本论文中,考虑到上述现实生活中存在的资源有限的问题,我们的目标是在各种低资源场景下开发高效的神经网络学习算法:低数据资源、低专家资源和低计算资源。 针对不同的场景,我们致力于研究如何提高模型的性能,使其能够逼近在丰富的资源条件下训练出的模型的性能,并最终满足实际工业部署的需求。 针对不同低资源场景中的挑战,本文针对不同的具体任务提出了新颖有效的方法。 本论文的主要贡献如下:1) 针对低数据资源场景,我们提出了多种技术,包括跨语言预训练、对偶学习和数据级别的知识蒸馏。 我们展示了它们在稀有语言的语音合成和识别任务中的有效性。 我们提出的方法已经部署在 Microsoft Azure,支持了数十种语言的语音合成。 2) 针对低专家资源,我们提出了一种新颖的神经架构搜索算法来自动寻找架构。 设计神经架构需要具有丰富领域知识的深度学习专家,这阻碍了深度学习技术在很多不同领域的广泛应用,例如生物医学和医疗保健。 与以前的神经架构搜索的方法不同,我们首先对过去自动架构搜索工作中的梯度干扰问题进行了彻底地实验分析,然后根据分析结果,我们提出了若干简单高效的神经架构搜索算法。 我们证明了这些方法在 BERT 预训练和阅读理解任务中的有效性。 3) 对于低计算资源,我们提出了一种与下游任务无关且自适应大小的压缩算法,以支持在各种资源受限的设备上部署大型模型。 当前最先进的神经网络(例如,BERT) 通常参数量较大,部署模型需要很大的计算和内存成本开销,这使得它们很难在现实生活的端设备上部署使用。 为了压缩这些模型以部署在不同的任务和设备上,我们提出了NAS-BERT,以支持输出多个具有自适应大小和延迟的压缩模型。 此外,我们采用了多种技术来提高搜索的效率和准确性。 在GLUE和SQuAD基准数据集上进行的大量实验表明,NAS-BERT可以找到比以前的方法效果更好的轻量级模型。 我们提出的方法已部署在Microsoft ML.NET和Microsoft Azure中,并用于移动设备中的文本分析。

While deep learning models have achieved great success in various tasks, such as speech recognition and machine translation, they still heavily rely on the collection of massive data, the design of efficient architecture by human experts, and the costly computational hardware for training and deployment. In real-world scenarios, developing a high-performance neural network could be challenging since the data resources, human expert resources, and computational resources are usually limited, which hinders the application of deep neural networks to the industry. In this thesis, taking into consideration of real-world limitations mentioned above, we aim at developing efficient neural networks under various low-resource scenarios: low data resource, low human expert resource, and low computational resource. For different scenarios, we study how to improve the performance of models so that they can approach the models trained under rich resource settings, and finally, meet the industrial deployment requirements for practical usage. For challenges in different low-resource scenarios, this thesis proposes novel and effective methods based on the challenges of the concrete task. The main results of the thesis are as follows: 1) For the low data resource scenario, we propose several techniques including cross-lingual pre-training, dual transformation, and data-level knowledge distillation. We demonstrate their effectiveness in speech synthesis (text to speech, TTS) and recognition (automatic speech recognition, ASR) tasks for rare languages. We first conduct pre-training on rich-resource languages and fine-tuning on low-resource languages, then leverage dual transformation between TTS and ASR to iteratively boost the accuracy of each other, and lastly, design several metrics to filter out low-quality generated speech data for TTS and ASR customization through data-level knowledge distillation. Our methods achieve high quality for TTS in terms of both intelligibility (more than 98% intelligibility rate) and naturalness (above 3.5 mean opinion score (MOS)) of the synthesized speech. Our proposed methods have been deployed in Microsoft Azure to support TTS on dozens of languages. 2) For the low human expert resource, we propose a neural architecture search algorithm (NAS) to automatically find novel architectures. Designing neural architectures requires a lot of domain knowledge from the human expert, which hinders the application of deep learning in various domains such as biomedicine and healthcare. Different from previous neural architecture methods, we first conduct thorough analyses on the interference issue in previous NAS methods, and then, propose a simple and efficient neural architecture search algorithm based on our findings from experimental analyses. We demonstrate its effectiveness in the BERT pre-training and reading comprehension tasks. Our discovered architecture outperforms RoBERTa_base by 1.1 and 0.6 points, and ELECTRA_base by 1.6 and 1.1 points on the dev and test of GLUE tasks respectively. 3) For the low computational resource, we propose a task-agnostic and adaptive-size compression algorithm to support large model deployment on various resource-restricted devices. Current state-of-the-art neural networks (e.g., BERT) usually have large numbers of parameters and suffer from big computational and memory costs, which makes them difficult for real-world deployment. To compress these models for deployment on different tasks and devices, we propose NAS-BERT by training a big super-net on a carefully designed search space containing a variety of architectures and outputs multiple compressed models with adaptive sizes and latency. Furthermore, we employ several techniques to improve search efficiency and accuracy. Extensive experiments on GLUE and SQuAD benchmark datasets demonstrate that NAS-BERT can find lightweight and adaptive models with better accuracy than previous approaches. Our proposed methods have been deployed in Microsoft ML.NET and Microsoft Azure for text analysis in mobile devices.