在大数据时代信息过载的背景下,个性化推荐系统被广泛的应用于互联网平台上,通过理解用户行为背后的深层兴趣帮助用户寻找所需的内容和信息,已成为支撑智能互联网的基础技术之一。近年来,深度学习和神经网络促进了推荐系统模型的进一步发展。然而,精准且高效的神经网络推荐系统仍是困扰学术界和产业界的重要问题之一。许多研究倾向于构建复杂的神经网络模型来提升推荐效果而忽视了模型的训练效率。同时,目前广泛采用的负采样训练策略很难使推荐模型学习到最优的状态。推荐系统多应用场景中的不同数据特点及近年来对于数据可遗忘的需求,也对模型表现和高效性带来了更加复杂的挑战。考虑到推荐系统的应用范围之广和影响之大,精准且高效的神经网络推荐研究具有其紧迫性和重要性。本文从高效非采样隐式数据学习、高效多场景推荐建模以及高效可遗忘推荐框架三个方面展开研究,主要贡献及创新点如下:首先,本文研究如何从海量隐式数据中实现精准且高效的非采样学习。从模型训练的角度出发,本文通过严谨的数学推理,设计了一系列高效的、可用于神经网络训练的非采样学习算法,将从全样本数据中学习的时间复杂度降低了一个数量级。在现实数据集上的实验表明,所提出的算法相对于已有方法,推荐精准度提升 5%以上,训练速度提升5-30倍。这一研究也是基础性的,本质上,从只有正向交互的数据中学习的机器学习任务和算法都可以使用这一解决方案。第二,本文基于所设计的高效非采样学习算法,针对推荐系统不同应用场景的数据特点,研究如何高效且精细化建模结合异质交互行为(社交网络、多类型交互行为)与内容信息(特征信息、知识图谱)的神经网络推荐系统模型。所提出的模型在多个现实数据集上相比已有基准方法,在模型表现和训练效率上均取得了显著的提升效果。例如,在结合多行为数据的推荐任务上,推荐精准度提升40%以上,训练速度快10倍以上。第三,本文在高效非采样推荐模型的基础上更进一步,针对数据可遗忘需求,研究了高效可遗忘推荐系统问题。即在收到数据遗忘的请求时(如用户主动要求删除数据,坏数据去除等),模型能够快速去除这些数据对于学习结果带来的影响。本文设计了一个通用的高效可遗忘推荐系统框架。现实数据集上的实验表明,所提出的方法可以在保证推荐精度的前提下实现高效的数据遗忘。
Under the background of information overload in the era of Big Data, personalized recommender system has been widely deployed in Web applications to help users seek desired information and items by learning users' potential preferences from behavioral data. It has become one of the most basic supportive techniques of web intelligence. In recent years, techniques like deep learning and neural networks have promoted the further development of recommendation models. However, efficient and effective neural recommendation is still an important issue for both the research community and practical application. Many deep learning studies only focus on obtaining better results but ignored the computational efficiency of reaching the reported accuracy. Meanwhile, the performances of existing recommendation methods are limited by the inherent weakness of the widely-used sampling-based learning strategy. The different data characteristics in various recommendation scenarios and the recent demand of erasable machine learning also bring more complex challenges to the effectiveness and efficiency of neural recommendation models. Considering the wide application range and great effect of recommender systems, the research of efficient and effective neural recommendation is urgent and critical. In this thesis, we conducts research in aspects of efficient non-sampling learning, efficient recommendation models for various scenarios, and efficient recommendation unlearning framework. The main contributions and innovations are as follows: First, this work studies the basic problem of how to realize efficient and effective non-sampling learning from massive implicit data. Through rigorous mathematical analysis, we derive several new optimization methods, which resolves computational bottlenecks in optimization by leveraging the sparsity of implicit data. Experiments on real-world datasets show that the proposed methods outperform the state-of-the-art methods by more than 5% in accuracy and 5-30 times in training efficiency. This research is also a fundamental study of machine learning, which has the potential to benefit many tasks (not limited to recommendation) where only positive data is observed. Second, based on the proposed efficient non-sampling learning methods, this work studies how to efficiently and finely-grained build neural recommendation models with social network, multi-behavior data, context information, and knowledge graph. The proposed models significantly better than state-of-the-art models in both recommendation performance and training efficiency. E.g., on multi-behavior recommendation scenario, we achieve more than 40% better performance and 10 times faster training than existing methods. Third, in terms of the recent demand of erasable machine learning, this work studies the problem of efficient recommendation unlearning. In many cases, a recommender system also needs to forget certain sensitive data and its complete lineage. From the perspective of privacy, several privacy regulations have recently been enacted, requiring systems to eliminate any impact of the data whose owner requests to forget. From the perspective of utility, if a system’s utility is damaged by some bad data, the system needs to forget such data to regain utility. This work proposes a general machine unlearning framework tailored to recommendation tasks, which can not only achieve efficient unlearning but also outperform the state-of-the-art machine unlearning methods in terms of recommendation utility.