解耦表征学习旨在从观测数据中提取并解耦表示不同解释因子的特征,是解决人工智能如何理解世界、学习概念、迁移知识等问题的关键。以变分自编码器和对抗生成网络为代表的解耦表征方法推动表征学习领域取得了长足发展,但现有方法仍然面临诸多问题。首先,主流模型通常依赖于生成网络,其训练相对困难,在复杂情形下生成数据较难拟合真实分布。再者,主流模型对解耦表征的可解释性和可操控性不足,其解耦表征往往与研究者预定义的概念不匹配。进一步地,尽管主流模型在简单数据与合成数据上取得了一定的效果,但上述问题不可避免地导致了在复杂真实数据上,模型解耦能力不足,对后继应用的帮助有限。本文将围绕上述问题展开,充分利用真实数据中的信息,提出可解释、可操控的表征解耦模型,并利用解耦表征提高模型在后继任务和应用中的表现。本文主要贡献如下:1. 提出了一种基于对抗学习的解耦表征方法。该方法利用标签信息来控制表征的解耦,以对抗学习的方式使表征的各个部分分别提取到对应于各个解释因子的信息,而不采用生成网络。本文利用该方法解耦复杂脑电数据中的人体耐力因子与个体背景因子,并从中挖掘出了与人体耐力表现相关的大脑皮层区域和脑电信号频带。2. 提出了一种基于因果机制的解耦表征方法。该方法利用部分标签信息,并引入因果关系对模型表征进行约束,实现了有监督因子和无监督因子的解耦。在图像数据上,本文利用该方法构建了具体的光影分离网络,通过使用语义标签和因果结构的信息,解耦了图像中的语义特征和光影特征。在下游任务中,将解耦的光影特征对图像进行增广,解决了训练样本数量与多样性不足的问题,提高了识别模型的准确率与对光影干扰的鲁棒性。3. 基于解耦的光影表征,构建了一个通用光影知识库,提出了一种跨领域数据增广与知识迁移的方法。为将解耦的光影表征有效迁移到其他领域,采用聚类筛选和特征插值的方式构建通用光影知识库。将该知识库应用到自然图像的小样本分类任务中,提出了基于光影特征的跨领域数据增广方法。该方法兼容转导设定和归纳设定下的小样本分类模型,提高了模型对光影干扰的鲁棒性。此外,本文还提出了图像空间中的光影迁移方法,通过训练简单重构网络,将光影特征迁移到了任意目标物体上,合成了具有似真光影的新图像。
Disentangled representation learning aims to extract and decouple features representing different explanatory factors from observed data, which is the key to solving the problems of how artificial intelligence understands the world, learns concepts, and transfers knowledge. Disentangled models represented by variational autoencoders and generative adversarial networks have promoted the field of representation learning to make great progress, but the existing methods still face many problems. First, mainstream models usually rely on generative networks, which are relatively difficult to train. Especially in complex situations, generating data that fit the true distribution can be difficult. Moreover, the disentangled representations of mainstream models lacks the interpretability and manipulability, which do not match the concepts predefined by researchers. Although mainstream models have achieved certain effects on simple and synthetic data, the above problems inevitably lead to insufficient disentangling ability on complex real-world data and limited promotion for downstream applications. To address these issues, this dissertation proposes interpretable and manipulable representation disentangling models by exploiting the information in real data. Furthermore, the disentangled representations are used to improve the performance of the models in downstream tasks and applications. The main contributions are as follows:1. This dissertation proposes a supervised method based on adversarial learning to disentangle representation. This method manipulates the disentanglement of representation by predefined explanation factors with label information. The information corresponding to each explanatory factor is extracted into certain fragments of the representation by adversarial learning, instead of applying generative networks. Inthe application of neuroscience, the proposed method is used to disentangle human endurance characteristics from individual background characteristics in complex electroencephalogram (EEG) data, and discover the cerebral cortex regions and EEG frequency bands related to human endurance performance.2. This dissertation proposes a semi-supervised method based on a causal mechanism to disentangle representation. The method exploits partial label information and introduces causality to constrain the representation in order to disentangle both supervised factors and unsupervised factors. Based on the information of semantic labels and causal structures, a specific network is designed to manipulate the disentanglement of semantic information and illumination information in image data. The problems of insufficient number and diversity of training samples are solved by augmenting the images with disentangled illumination features. The method improves the recognition accuracy of the model and its robustness to the illumination variation.3. This dissertation constructs a universal illumination knowledge repository based on the disentangled illumination representation, and proposes the cross-domain data augmentation and knowledge transfer methods. In order to effectively transfer illumination knowledge to other fields, a universal illumination repository is firstly constructed by feature selection and interpolation. Then a cross-domain data augmentation method is proposed to transfer this knowledge repository to natural images in few-shot classification tasks. The method is widely compatible with fewshot baseline methods under both transductive and inductive settings, and effectively improves the robustness of the model to illumination interference. Furthermore, an illumination transfer method in image space is proposed. By training a simple reconstruction network, the illumination features can be transplanted to any target objects to synthesize new images with verisimilar illumination.