登录 EN

添加临时用户

基于意图感知的序列推荐方法研究

Research on Intention-Aware Sequential Recommendation Methods

作者:倪诗影
  • 学号
    2017******
  • 学位
    博士
  • 电子邮箱
    nis******com
  • 答辩日期
    2023.05.23
  • 导师
    李乐飞
  • 学科名
    管理科学与工程
  • 页码
    128
  • 保密级别
    公开
  • 培养单位
    016 工业工程系
  • 中文关键词
    推荐系统, 序列推荐, 意图感知, 深度学习
  • 英文关键词
    recommendation system, sequential recommendation, intention-aware, deep learning

摘要

互联网的飞速发展在方便人们生活的同时,也带来了日趋严重的信息爆炸问题。推荐系统作为处理信息过载问题的重要技术手段之一,在筛选信息和辅助决策方面具有至关重要的意义。其中,关注用户偏好的时间动态性的序列推荐系统受到了广泛的关注与研究。在海量数据和复杂交互场景的影响下,序列推荐面临着交互数据稀疏性、用户行为漂移性和序列依赖复杂性等多重挑战。为了应对这些挑战,本文提出利用用户意图信息来辅助用户偏好的预测,并对意图感知的序列推荐算法展开研究。本文的主要研究内容包括:(1)针对物品层面的兴趣转移中的混合性和漂移性问题,和用户-物品交互数据的稀疏性问题,提出了一种基于马尔可夫链的用户意图感知的序列推荐模型。模型把用户偏好分解为短期意图和长期偏好两部分。在短期意图的建模中,模型将意图表示为品类和行为类型的二元组,使用高阶马尔可夫过程来建模短期意图的转移,并用因子化的混合转移分布对其进行近似。模型还使用矩阵分解法对用户的长期偏好进行建模,并在预测层中融合短期意图和长期偏好进行推荐。在多个真实世界数据集上开展的一系列实验证明了本模型相较于其他基线模型的性能优越性,同时具有良好的可拓展性和训练效率。(2)针对当前序列推荐算法中所面临的物品表征不够充分与时序规律难以捕捉等问题,提出了一种基于注意力网络的意图引导的序列推荐模型。本文提出了一种重加权自注意力网络并以此为基础构建了行为感知的意图编码器。意图编码器使用行为类型序列为辅助信息,从品类序列中学习意图偏好表征。在物品编码器中,意图偏好表征被用于引导一个注意力网络从物品序列中学习物品偏好表征。预测层融合了物品偏好及意图偏好并进行推荐。多个真实世界数据集上的实验结果表明,本模型优于其他基线模型。消融实验进一步验证了行为感知的意图编码器、意图引导的物品编码器和预测层结构的有效性。(3)针对深度强化学习推荐模型中稀疏交互和复杂用户行为所导致的强化学习训练稳定性差和模型表现不佳问题,提出了一种基于深度强化学习的意图辅助的序列推荐模型。模型将推荐任务建模为马尔可夫决策过程,并使用策略梯度算法进行优化求解。意图信息被用于增强状态表征网络和策略网络的设计。模型还引入了监督学习信号和奖励基线以提高网络训练稳定性和推荐效果。在多个真实世界电商数据集上开展的实验证明了本模型的先进性。我们还通过消融实验验证了引入意图信息、监督学习信号及奖励基线的有效性。

The rapid development of the Internet facilitates everyone‘s life. Nevertheless, it also makes the information explosion problem a growing issue. Recommendation systems are one of the important means of dealing with information overload and play a significant role in filtering information and facilitating decision-making. The sequential recommendation which captures sequential dynamics of interests receives much attention and research interest. Because of the massive data and complex interaction scenarios, sequential recommenders face challenges of sparse user-item interactions, shifting user behaviors, and intricate sequential dependencies. To address these issues, we consider utilizing intention information to facilitate the prediction of user future preferences and study intention-aware sequential recommenders. The main contents of our research are as follows.(1) To deal with the issues caused by the mixing and shifting item-wise interests and sparse user-item interactions, we propose an intention-aware Markov chain based sequential recommendation method. We decompose the user preferences into short-term intentions and long-term preferences. We use the tuple of category and behavior type to denote intention and model the intention transition as a high-order Markov chain, which is approximated by a factorized mixture transition distribution. Meanwhile, we use the matrix factorization method to model the long-term item-wise preferences. Short-term intention-wise preferences and long-term item-wise preferences are fused for prediction. Experiments on real-world e-commerce datasets show the superiority of our model over the baseline models, with good scalability and training efficiency.(2) To facilitate the representation of items and the modeling of sequential dynamics, we propose an intention-guided recommendation model building on attention networks. We propose a reweighted self-attention network that is used to build an intention encoder to model the long-order dependencies and learn intention preferences from multi-behavior category sequences. In the item encoder, intention preferences are used to guide an attentive network to learn item preferences. In the prediction layer, both item and intention preferences are fused to make recommendations. Experimental results on real-world e-commerce datasets demonstrate that the proposed model outperforms all baseline models. In addition, we conduct ablation studies and verify the effectiveness of the behavior-aware intention encoder, the intention-guided item encoder, and the prediction head.(3) We propose an intention-aware deep reinforcement learning based (DRL-based) recommendation method to address the issues of low training stability and undesirable model performance due to sparse interactions and intricate behaviors in the DRL-based recommenders. We formulate the recommendation problem as a Markov decision process and solve it by the policy gradient method. The model utilizes intention information to enhance the design of the state representation network and the policy network. We also introduce supervised learning signals and reward baselines to make a more controllable training process and achieve better performance. Experiments on real-world e-commerce datasets show that our model outperforms all the baselines. The ablation study validates the effectiveness of the key components of our model, including the awareness of intentions, supervised learning signals, and reward baselines.