登录 EN

添加临时用户

面向多层次用户兴趣的推荐系统建模方法研究

Research on Modeling Multi-Level User Interest in Recommender Systems

作者:丁璟韬
  • 学号
    2015******
  • 学位
    博士
  • 电子邮箱
    din******.cn
  • 答辩日期
    2020.05.19
  • 导师
    金德鹏
  • 学科名
    信息与通信工程
  • 页码
    150
  • 保密级别
    公开
  • 培养单位
    023 电子系
  • 中文关键词
    推荐系统,用户兴趣,排序学习,负采样,上下文情境
  • 英文关键词
    Recommender System,User Interest,Learning to Rank,Negative Sampling,Context

摘要

在大数据时代信息过载的背景下,个性化推荐系统通过理解用户行为背后的深层兴趣,实现了用户-信息之间的精准匹配,在提升服务体验和商业价值上有重要应用价值。作为推荐系统关键技术之一,用户兴趣建模是一个得到广泛关注的重要研究课题。该课题存在以下挑战:首先,能够观测到的用户显式兴趣是有限的,现有方法难以有效、精细的刻画其强弱关系;其次,隐藏于缺失数据中的用户隐式兴趣提供了至关重要的负反馈信息,其高效、可靠的提取技术手段仍然不足;最后,用户动态兴趣与所处的上下文情境之间存在紧密关联,缺乏有效的建模方法。针对以上挑战,本文对基于多行为反馈数据的显式兴趣建模、基于缺失数据负采样的隐式兴趣建模和结合上下文情境的动态兴趣建模三个关键问题展开研究,提供了面向多层次用户兴趣的推荐理论模型和关键技术。论文的主要创新点与贡献如下:第一,在显式兴趣层面,本文研究了结合多行为排序学习的推荐系统显式兴趣建模方法,针对建模多行为间用户兴趣偏序关系所面临的效率低下问题,分别从采样学习和全数据学习两个角度设计解决方案。基于电商购物平台真实数据集的实验结果表明精细刻画不同行为的兴趣强弱能够大幅提升推荐性能15.7%~38.8%;同时,基于快速学习算法,全数据学习方法训练时间仅与数据规模呈线性关系,能有效支持海量数据的真实应用场景。第二,在隐式兴趣层面,本文通过设计候选样本空间的剪枝策略,提升缺失数据负采样方法的采样效率,能在不损伤模型推荐性能的情况下有效降低10-1000倍的候选样本空间大小。深入研究揭示了现有负采样方法无法有效规避错误负样本的本质缺陷,提出了分别根据历史曝光数据的特征和训练过程中样本的预测不确定性两种信号来指导可靠负样本采样的方法。真实数据集的实验显示较基线方法提升用户兴趣预测的排序准确性指标超过5%;负采样过程的鲁棒性显著增强。第三,在动态兴趣层面,本文研究了关注上下文情境信息的全面感知,从而对兴趣动态变化过程中的差异和共性进行自适应学习的建模方法。一方面,从时间自适应角度,提出了行为序列自注意力机制对时间情境进行精细建模,适配到序列推荐系统后,在兴趣预测的排序准确性上实现了较传统方法超过4%的性能提升;另一方面,从空间自适应角度,提出了基于空间情境建模用户兴趣漂移与迁移的推荐框架,较已有方法提升性能0.4%~20.5%。

Facing the information overload in the age of big data, personalized recommender systems can support accurate matching between users and information by understanding the deep interests behind user behaviors. Moreover, it can help to improve service quality and increase business value. As one of the key technologies of the recommender systems, user interest modeling is an important research topic that has received widespread attention in recent years. It has the following challenges: first, the explicit user interest that can be observed is limited, making it hard for existing methods to effectively characterize in a fine-grained level; second, the implicit user interest hidden in the missing data provides the crucial negative signal, while it still requires a more efficient and reliable extraction method; finally, there is a close connection between the users’ dynamic interest and the context where they are located, which has not been fully considered in previous works. To cope with the above challenges, this research focuses on the following three key problems: explicit interest modeling based on multi-behavior feedback data, implicit interest modeling based on negative sampling and dynamic interest modeling combing contextual information. It provides theoretical models and key technologies for modeling multi-level user interest in recommender systems. The main innovations and contributions of this paper are as follow,First, at the explicit interest level, this paper studies the multi-behavior learning-to-rank methodology for explicit interest modeling in recommender systems. To address the inefficiency problem caused by modeling pairwise ranking relations among different user behaviors, I propose two solutions in terms of sampling-based learning strategy and whole-data based learning strategy, respectively. As shown in extensive experiments on real-world datasets collected from E-commerce websites, modeling the pairwise ranking relations among different behaviors can greatly improve the recommendation performance by 15.7%~38.8%. At the same time, with the help of fast learning algorithm, the proposed whole-data based method has a rather low time complexity that is dependent on the observed data only, making it suitable for those real application scenarios of massive data.Second, at the level of implicit interest, this paper improves the sampling efficiency of the negative sampling from missing data by designing a pruning strategy for candidate sample space. It can effectively reduce the space cost by 10 to 1000 times without hurting the recommendation performance. Further in-depth study reveals that the existing negative sampling approaches cannot effectively avoid false negatives during the sampling process. Therefore, I propose to sample reliable negatives with the help of the historical exposure data and the prediction uncertainty of samples during training process, respectively. According to experimental results on real-world datasets, sampling reliable negatives help improving the final ranking metric by over 5%, and the robustness of sampling process is also significantly enhanced.Third, at the level of dynamic interest, this paper studies the comprehensive perception of contextual information, so as to adaptively model the difference and commonalities in the changing process of dynamic interest. On the one hand, from the temporal perspective, I propose a contextual-adaptive self-attention mechanism to finely characterize the impact of temporal context. After being adapted to the sequential recommender system, the improvement over traditional methods is over 4% in terms of the ranking metric. On the other hand, from the spatial perspective, I propose a spatial context-based recommendation framework to model user interest drift and transfer simultaneously, which further improves performance by 0.4%~20.5% compared with existing methods.