登录 EN

添加临时用户

面向资源受限环境的神经网络知识迁移方法

Neural Network Knowledge Transfer Method for Resource-Constrained Environment

作者:王朝飞
  • 学号
    2018******
  • 学位
    博士
  • 电子邮箱
    wan******.cn
  • 答辩日期
    2023.05.18
  • 导师
    吴澄
  • 学科名
    控制科学与工程
  • 页码
    115
  • 保密级别
    公开
  • 培养单位
    025 自动化系
  • 中文关键词
    深度神经网络,资源受限,知识迁移,知识蒸馏,小样本
  • 英文关键词
    Deep neural network, Resource constraint, Knowledge transfer, Knowledge distillation, Few shot learning

摘要

高效的存储和计算能力、大规模的标注数据为深度神经网络达到应用级性能提供了巨大动力。然而,在实际应用场景中,大量智能设备的存储和计算能力十分有限,无法部署大规模神经网络模型;大量的标注数据很难获得或者成本昂贵。为此,面向资源受限环境的深度学习受到广泛关注,其目的是探索资源需求小的新模型和新方法。知识迁移是解决资源受限问题的有效途径之一,其主要思想是将其他任务或模型中的知识通过一定的方式迁移至目标任务中,从而降低目标任务模型对资源的需求。本文主要围绕面向资源受限环境的神经网络知识迁移方法展开研究,针对计算存储资源受限的情况,围绕以模型知识迁移为基础的知识蒸馏技术,研究基础理论、知识表示和知识迁移方法;针对数据资源受限的情况,围绕以数据或任务知识迁移为基础的小样本学习技术,研究小样本条件下的图像数据增强和图像生成方法。本文的主要研究成果具体如下:1. 发现并解释了过程模型在知识蒸馏中的优势作用。通过建立信息瓶颈理论与知识蒸馏之间的联系,提出从互信息的角度解释“暗知识”,认为:过程模型比收敛模型保存了更多的关于输入的互信息,因而具有更强的蒸馏能力。2. 提出了一种基于类激活图的知识蒸馏方法。将类激活图作为一种知识表示方法应用于自知识蒸馏,在不增加训练成本的前提下有效提升卷积神经网络的分类性能;应用于离线知识蒸馏,则可以提升学生模型的收敛速度和分类性能。3. 提出了一种基于师生协作课程制定的知识蒸馏方法。探索了课程学习与知识蒸馏相结合的有效性,以教师和学生的加权集成模型作为样本难度测量器,动态更新集成模型中的权重参数,实现学生模型从教师模型递进式迁移知识。4. 提出了一种基于前景目标变换的小样本图像数据增强方法。引入显著性目标检测方法去除图像背景,将基类数据中的前景目标变换知识迁移至新类数据,从而实现样本扩增。该方法能够有效解决细粒度图像类内方差大类间方差小的问题,与基于微调的小样本图像分类基线方法相结合,即可达到最具相竞争力的水平。5. 提出了一种基于松弛空间结构对齐的小样本图像生成方法。通过跨域空间结构一致性约束将源域图像的空间结构知识迁移至小样本目标域,提升生成图像的多样性,引入隐空间压缩方法提升目标域特征的学习效率。该方法达到了极小样本条件下图像生成的最先进水平。

Efficient storage and computing power and large-scale annotated data provide great impetus for deep neural networks to achieve application-level performance. However, in more practical scenarios, a large number of intelligent devices are resource-limited, so that large-scale deep neural networks can not be deployed. Large amounts of labeled data are difficult to obtain or expensive to annotate. These application requirements have led to a hot research direction, namely deep learning in resource-constrained environment. Its purpose is exploring new models and methods to reduce the demand for storage, computation and data resources. Knowledge transfer is an effective way to solve such problems. Its main idea is to transfer the knowledge from other tasks (or models) to the target task, so as to reduce the requirement for resources to the target task model.This thesis mainly focuses on the research of knowledge transfer methods for the resource-constrained environment. Aiming at the scenes of limited computing and storage resources, the basic theory, knowledge representation and knowledge transfer method are studied around the model-based knowledge distillation technology. In view of the limited data resources, the image augmentation and generation methods in few shot setting are studied around the few shot learning technology. The main results are as follows:1. By establishing the connection between information bottleneck theory and knowledge distillation, this paper proposes to explain "dark knowledge" from the perspective of mutual information, and claims that an intermediate model can preserve more mutual information about input than the converged model, so it has stronger distillation capacity.2. A knowledge distillation method based on class activation map is proposed. Applying class activation maps to self-distillation effectively improves the classification performance of convolutional neural networks without increasing the training cost. Applying class activation maps to offline distillation can improve the convergence speed and classification performance of the student model.3. A knowledge distillation method based on teacher-student cooperative curriculum customization is proposed. The effectiveness of the combination of curriculum learning and knowledge distillation is explored. A weighted ensemble of teacher and student models is used as the difficulty measurer. Dynamic weights are adopted to transfer knowledge from the teacher model to the student model progressively.4. A image augmentation method based on foreground object transformation is proposed. The salient object detection method is introduced to remove the image background and construct the posture transform dataset. The number of novel class samples can be enlarged by transferring the posture transformation knowledge from base classes. This method can effectively solve the problem of small intra-class variance and large inter-class variance for fine-grained images. Combined with the fine-tuning baselines, it can obtain competitive capacity compared with the most advanced methods.5. A few-shot image generation method based on relaxed spatial structural alignment is proposed. In order to adapt a source domain generator to a few shot target domain, the spatial structure information of source domain images is preserved via a cross-domain spatial structure consistency constraint. A latent space compression strategy is introduced to improve the learning efficiency of the target domain features. This method has reached the state-of-the-art in few shot image generation tasks. In addition, a novel evaluation metric SCS is proposed to measure the quality of image spatial structure, which can be used as an effective supplement to the existing quantitative metrics.