登录 EN

添加临时用户

小样本学习理论方法研究

Few-shot Learning: Theories and Methods

作者:周琳钧
  • 学号
    2016******
  • 学位
    博士
  • 电子邮箱
    zho******com
  • 答辩日期
    2021.05.17
  • 导师
    杨士强
  • 学科名
    计算机科学与技术
  • 页码
    115
  • 保密级别
    公开
  • 培养单位
    024 计算机系
  • 中文关键词
    小样本学习, 模型泛化, 迁移学习, 深度学习
  • 英文关键词
    Few-shot Learning, Model Generalization, Transfer Learning, Deep Learning

摘要

深度学习的出现极大地推动了人工智能的发展,然而和真正的人类智慧不同,深度学习虽然具有强大的拟合能力, 但是其推理能力却非常弱。这就意味着在训练样本数量较少的情况下往往不能像人类一样拥有快速学习的能力。而在某些现实场景中获取训练样本的成本非常高,训练样本数量不足的情况是十分常见的。因此,小样本学习的概念便应运而生。小样本学习的目标是在训练样本数量不足的情况下尽可能提升机器学习模型的学习能力,其研究目的是增强机器学习模型的泛化性能和对训练数据的高效利用,同时提升机器学习模型的推理能力,这些要素是通向真正的人工智能的必经之路。本文主要针对深度学习模型进行小样本学习方面的研究。为了解决上述挑战,我们从理论上改进深度学习的分类层,表征层和输入层来增强其小样本学习的能力,最后从应用的角度针对黑盒对抗攻击这一实际应用场景在小样本学习层面做出了算法上的改进。本文的主要贡献如下:从分类层的角度,我们提出了基于视觉类推的小样本学习方法。人类的快速学习能力来源于视觉类推,即能从已学会的类别中找到和小样本新类较为相似的类别并建立一种泛化机制。我们在深度神经网络的分类层部分提出了视觉类推图嵌入回归算法,来模拟人类的视觉类推机制以增强模型的泛化能力。从表征层的角度,我们提出了小样本学习的表征优化方法。传统的深度神经网络的表征层适配于大规模训练样本的学习,而往往不适用于小样本学习。我们从最小方差无偏估计的理论出发,提出了一种更适合小样本学习的表征:判别变分表征。新的表征方法能够大幅提升深度神经网络的小样本学习效率。从输入层的角度,我们首次提出了小样本学习中的基类选择问题。在小样本学习中,我们认为在学习小样本新类之前预先学习的基础类别对模型的泛化能力至关重要。选择好的基类往往可以极大地提升新类的学习效率。为此我们提出了基于相似比率优化的基类选择方法,从一组规模较大的候选类别中选择基类,构造泛化能力更强的基类数据集。从应用的角度,我们将小样本学习应用于实际的黑盒对抗攻击的场景中。我们提出了本征黑盒攻击算法,利用一个预先训练好的模型的梯度信息来辅助推断黑盒对抗攻击中最优扰动方向,有效降低了黑盒模型的查询数目,并实现了具有更高效度的对抗攻击。

The emergence of deep learning promotes the development of Artificial Intelligence. However, different from human intelligence, although the fitting ability of deep learning is strong enough, the reasoning ability is rather weak. Hence, in the case of insufficient training samples, deep learning lacks the ability of fast learning like human beings. In certain practical applications, it is costly to get training samples and it is common that the training samples are insufficient. To this background, the concept of few-shot learning appears. The target for few-shot learning is to improve the learning ability of machine learning models under insufficient training samples. Enhancing the generalization ability and the reasoning ability as well as using training data more efficiently are key factors to real AI, which need to be tackled in few-shot learning. In this paper, we mainly focus on few-shot learning for deep learning models. We theoretically promote few-shot learning from three aspects: the classification layer, the representation layer and the input layer. Finally for a practical application scenario of black-box adversarial attack, we improve the attack efficiency from a few-shot learning perspective. We conclude our contributions as follows:From the perspective of the classification layer, we propose few-shot learning algorithms based on visual analogy. The ability of fast learning of human beings is attributed to the visual analogy mechanism. Humans are able to learn a new concept from previously learned similar concepts through generalization. Hence, we propose a Visual Analogy Graph Embedded Regression (VAGER) algorithm on the classification layer of deep learning models, to simulate the mechanism of the visual analogy of human beings and further enhance the generalization ability of deep learning.From the perspective of the representation layer, we propose the optimal representation for few-shot learning. The traditional representation layer of deep neural networks is fit for learning through large-scaled training samples, while it may perform worse on few-shot learning. Inspired by the Minimum Variance Unbiased Estimation (MVUE) theory, we propose the Discriminative Variational Embedding (DVE), a new representation for few-shot learning which substantially enhances the efficiency of the few-shot learning for deep models.From the perspective of the input layer, we propose the base class selection problem in few-shot learning for the first time in the world. The selection of base classes is very important to the generalization ability of the model in the few-shot learning. Reasonable base classes can greatly improve the learning efficiency of novel classes. Hence we propose a base classes selection algorithm based on Similarity Ratio, which constructs the base dataset with strong generalization ability from a broad of candidate classes.From the perspective of practical application, we apply few-shot learning in a practical black-box adversarial attack problem. We propose the Eigen Black-box Attack (EigenBA), utilizing the gradient information of a pre-trained model to infer the optimal perturbation direction in black-box adversarial attack. Experiments show that the proposed algorithm achieves more effective attack by reducing the query number of the black-box model.