多模态图数据是最常见的图数据形式之一,用于表示多模态对象之间的复杂关系。多模态图学习利用多模态信息、跨模态信息和图结构信息来优化表示,以在化学分子分类、社交网络低质量文本检测、化合物相互作用预测等下游任务中使用。由于多模态图数据的多样性,构建多模态图神经网络是一项耗时且具有挑战性的任务,需要专业领域知识来进行高效的建模和训练。此外,真实世界的多模态图学习方法通常不具备持续学习能力和分布外泛化能力,这使得将多模态图神经网络推广到动态任务和未知分布后性能会大大下降。尽管神经架构搜索在深度学习中的应用越来越广泛,但传统的神经架构搜索方法不适用于多模态图数据,因为它们忽略了多模态节点的消息传递机制和复杂的多模态交互。如何让多模态图神经网络结构搜索方法从动态数据和任务中持续学习,如何将最优模型推广到未知数据分布也是重要问题。为了应对这些挑战,需要更强大的多模态图神经架构搜索方法。在本文中,我们研究了分布内学习、持续学习和分布外学习三种场景下通过神经架构搜索方法自动设计多模态图神经架构的方法。主要内容如下:1. 为了自动搜索给定任务和数据集下的最优多模态图神经网络,我们设计了一种多模态图神经网络结构搜索方法,该方法同时搜索单模态图特征传播策略和多模态融合策略。我们为多模态图神经网络设计了一个新的搜索空间,涵盖了最先进的多模态图神经网络方法。2. 为了解决多模态图神经结构搜索的灾难性遗忘问题,我们提出了自适应多模态图神经网络模型,并结合共享策略,旨在避免对相似任务进行不必要的架构扩展。此外,我们开发了一种结构演化的持续多模态图学习模型,可以在不损害有用的历史信息的情况下自适应地探索模型架构。3. 为了解决多模态图神经结构搜索中的分布外泛化问题,我们提出了分布外泛化下的多模态图神经结构搜索方法。首先,我们提出了多模态特征去相关技术,以解耦不变特征和变化特征。其次,我们通过优化全局样本权重以自动搜索泛化能力最优的多模态图神经网络模型。本文研究为多模态图神经结构搜索提供了新方法,可以显著增强实际场景下的多模态图学习效果。此外,该研究也为未来的多模态图学习研究提供了启示,为未来的多模态图学习研究提供了新的思路。
Multimodal graph data is one of the most common forms of graph data which is used to represent complex relationships between multimodal objects. Multimodal graph learning leverages both multimodal representation, crossmodal information and graph structure to optimize representation for downstream tasks such as chemical molecule classification, low-quality text detection in social networks, and compounds interactions prediction. Due to the diversity of multimodal graph data, constructing a multimodal graph neural network (MGNN) is a time consuming and challenging task that requires specialized domain knowledge to efficiently build and train models. Furthermore, multimodal graph learning methods often lack continual learning ability and out-of-distribution generalization capability, making the performance significantly compromised when facing sequential tasks and unknown data distribution. Although neural architecture search (NAS) has become increasingly popular in deep learning, traditional NAS methods are not suitable for multimodal graph data because they overlook message-passing mechanisms and complex multimodal interactions of multimodal nodes. How to enable multimodal graph neural structure search method to learn continuously from dynamic data and tasks, and how to extend the optimal model to unknown data distributions, are important problems to be addressed. To tackle these challenges, more powerful multimodal graph neural architecture search methods are needed. In this work, we propose to automate the design of MGNNs via NAS under three circumstances, including in-distribution (ID), continual and out-of-distribution (OOD) scenarios. We summarize the main contributions as follows: 1. To automatically seach for the best MGNN models for specific task and dataset, we design a multimodal graph neural architecture search (MGNAS) method that simultaneously search for singlemodal graph feature propagation strategy and multimodal fusion strategy. We design a new search space for MGNNs, covering state-of-the-art MGNNs. 2. To address the catastrophic forgetting problem in multimodal graph neural architecture search, we propose an adaptive MGNN (AdaMGNN) model with a sharing strategy that avoids unnecessary architecture extensions for similar tasks. Additionally, we develop a structure-evolving continual multimodal graph learning (SCMGL) model that can adaptively explore model architectures without harming historical information. 3. To solve the OOD problem in multimodal graph neural architecture search, we propose a out-of distribution multimodal graph neural architecture search (OMG-NAS) method. Firstly, we propose a multimodal graph feature decorrelation (MGFD) to decouple invariant features from variable features. We then optimize the global sample weight to automatically search for the optimal multimodal graph neural network model with the best generalization performance. The research in this paper provides new methods for multimodal graph neural architecture search, which can significantly enhance the effectiveness of multimodal graph learning in real-word scenarios. Moreover, this research provides inspiration for future multimodal graph learning studies, offering new directions for future research.