登录 EN

添加临时用户

基于图神经网络的足球比赛结果预测方法研究

Soccer Match Outcome Prediction Based on Graph Neural Network

作者:杨喜凯
  • 学号
    2020******
  • 学位
    硕士
  • 答辩日期
    2023.05.12
  • 导师
    刘红岩
  • 学科名
    管理科学与工程
  • 页码
    74
  • 保密级别
    公开
  • 培养单位
    051 经管学院
  • 中文关键词
    足球比赛结果预测,图神经网络,深度学习,时间序列分析
  • 英文关键词
    Soccer Outcome Prediction, Graph Neural Network, Deep Learning, Time Series Analysis

摘要

足球是世界范围内最受欢迎的团队运动之一,每年吸引着数以亿计的观众。预测未来足球比赛的结果对于球迷、球员个人、球队和联赛的运营等都有着重要的意义。截至目前,全球足球投注市场的价值已超过万亿,因此准确预测足球比赛的结果具有巨大的经济价值和产业价值。同时,预测模型的研究可以丰富人工智能模型和算法,因而本研究具有理论意义。 本文首先探索分析了当前最大的比赛日志公开数据集,从中提取球队或球员在比赛中发生各项事件的频率,并探究了历史比赛中各项事件对于预测下一场比赛结果的重要性,结果证明主场和客场球队的历史精准传球数/精准射门数,以及客场球队的历史妙传数/历史球门球数等特征对于预测比赛结果有着重要作用。 大多数的研究工作将单个球队作为研究单位,忽略了每场比赛不同的球员阵容以及球员自身的信息,且绝大多数预测模型对输入向量固定大小的要求也限制了对更多非结构化数据的探索和使用。本文提出基于图神经网络的足球比赛结果预测模型ST-GAT,其可以有效处理比赛日志信息这类非结构化数据,从更细粒度进行建模。与大多数仅刻画空间或时间信息的模型不同,ST-GAT包含两个子模型,SGAT模型在空间上刻画了球员之间的互动,TGAT模型考虑时序信息,刻画球队之间的动态博弈关系,两个子模型在训练过程中进行联合学习,彼此刻画了互补的信息源。除此之外,本文创新性地建模了不同球队之间球员基于位置分工的交互信息,这是以往该领域的工作中未曾考虑的因素。实验结果显示,本文提出的预测模型效果显著优于其他基准模型,且模型预测表现稳定,在各大联赛中的预测均表现良好。除此之外,本文通过实验验证了ST-GAT中各个子模型的效果,验证了二者之间存在信息源上的互补效果,其中SGAT模型包含的信息更有预测力,TGAT模型则能有效地填补SGAT模型遗漏的部分信息。通过实验验证了各类特征对于模型预测效果的作用,体现了全局比赛状态特征以及不同球队之间球员交互信息的重要性 本文探索了图神经网络训练过程中可能存在的一些问题,例如过平滑、表达能力受限等等,并发现图结构上的边随机失活能够改善模型训练效果。本文提取了预测模型中得到的嵌入向量并对嵌入向量做了基本分析,并指出嵌入向量的其他应用方向。

Soccer is one of the most popular team sports in the world, attracting billions of the audience every year. Predicting soccer game outcome is of critical importance to fans, individual players, teams, and leagues. Up until now, the market value for soccer betting market has already exceeded trillion dollars. Therefore, systematically predicting soccer game outcome has great economic values as well as industrial values. Meanwhile, the research on prediction models can enrich the models and algorithms in artificial intelligence. Thus, this paper has theoretical significance. This paper explores the largest public dataset which has soccer matches log data, extracting the number of various kinds of events from the log data. And it explores the importance for all these events in predicting the future soccer match outcome. The results demonstrate that features such as the historical number of accurate simple pass/shots for both home and away teams, as well as the historical number of smart pass/goal kicks for away teams, all play a significant role in predicting future soccer match outcomes. Most of the research have focused on individual teams when modeling, ignoring the different lineups and individual player information in each match. And most of the models assume fixed size vector input and cannot handle well with the unstructured data. This paper proposes a soccer prediction model based on Graph Neural Network, which is called ST-GAT and can handle well with unstructured data like match logs data and model the relationship with finer granularity. Unlike most models which only capture spatial or temporal information, ST-GAT consists of two sub-models: SGAT captures the interaction between players spatially, while TGAT models the dynamic relationship between teams over time. The two sub-models are jointly trained and capture complementary information by each other. Furthermore, this paper innovatively models the interaction relationship between players from different teams, which hasn’t been done by any of the previous research. The experiments demonstrate that the proposed model significantly outperforms other benchmark models and have quite stable prediction performance, performing well in various top tournaments. SGAT model has stronger prediction performance and TGAT can capture the information that SGAT cannot. Through experiments, this paper verifies the effect of various features on the model‘s prediction performance and demonstrates the importance of global match state features and role-based player interaction features. This paper explores some of the potential issues that may arise during the training process of graph models, such as over-smoothing and limited expressive power. And it finds that edge dropout can help improve the model performance but node dropout cannot. This paper does some basic analysis on the embeddings extracted from the model and proposed some potential research directions using the embeddings.