登录 EN

添加临时用户

基于机器学习的数据库自治物化视图

Database Autonomous Materialized Views based on Machine Learning

作者:韩越
  • 学号
    2019******
  • 学位
    博士
  • 电子邮箱
    122******com
  • 答辩日期
    2024.05.23
  • 导师
    李国良
  • 学科名
    计算机科学与技术
  • 页码
    138
  • 保密级别
    公开
  • 培养单位
    024 计算机系
  • 中文关键词
    自治物化视图;物化视图自动生成;物化视图动态管理;物化视图预测
  • 英文关键词
    Autonomous Materialized View Management;Automatic Materialized View Generation;Dynamic Materialized View Management;Materialized View Forecasting

摘要

信息化时代社会在生产过程中使用数据库管理系统分析海量的数据。在数据库中,物化视图技术通过预计算查询结果提高查询速度,已成为重要的查询优化技术。然而现有的物化视图技术仍存在以下问题:(1)使用门槛高:数据库依赖数据库管理员管理物化视图,对其经验水平要求过高;(2)性能衰减快:物化视图集合维护周期长,无法适应查询负载变化,导致优化效果降低;(3)优化不及时:物化视图的生成滞后于新提交的查询,无法及时进行优化,导致用户查询速度慢。针对上述问题,本文对自治物化视图技术展开研究,主要研究内容与成果如下:1. 基于价值估计的物化视图自动生成。针对现有数据库依赖人工管理物化视图的问题,本文提出了物化视图自动生成框架AutoView。该框架能够准确估计物化视图的价值,并为用户自动选择总价值最大的物化视图集合。首先,该框架设计了物化视图的特征编码方法和基于循环神经网络的编码器-减法器模型来准确预测物化视图价值。然后提出了基于强化学习模型的物化视图的集合选择问题求解方法,并设计了物化视图的语义向量生成方法来解决物化视图之间的关系难以表示的问题。实验表明AutoView生成的物化视图优化效果比现有方法高14%。2. 自适应查询负载的物化视图动态管理。针对物化视图在动态查询负载上优化性能衰减的问题,本文提出了自适应查询负载的物化视图动态管理方法GnnMV。该方法能够动态地估计查询负载中物化视图的价值,并维护具有高价值物化视图集合。首先,该方法构建了高效的动态查询计划图来建模动态查询负载,并设计了特征聚合函数来生成图中节点的特征向量。然后提出了基于图神经网络的方法根据特征向量来准确地估计物化视图的实时价值,并挖掘潜在的高价值物化视图。实验表明GnnMV在动态负载上维护的物化视图优化效果比传统方法高16%。3. 事件感知的物化视图时序预测。针对现有物化视图优化滞后于新提交查询而导致用户体验差的问题,本文提出了事件感知的物化视图时序预测方法MVGPT。该方法能够预测查询的演变趋势并提前生成物化视图,实现对新查询的有效优化。首先,该方法构建了演变查询负载的相关事件知识库并设计了时间加权的事件检索方法。然后微调了大语言模型来根据事件知识预测物化视图的演变趋势。最后训练了贝叶斯网络来建模物化视图的特征,生成合法、高命中率的物化视图。实验表明MVGPT预测的物化视图对新查询的命中率比现有方法高48%。

In the information age, industries across the board utilize database management systems to analyze vast amounts of data during their production processes. Materialized view technology, which improves query speed by pre-computing query results, has become an important technique in database query optimization. However, existing materialized view management methods still face the following issues: (1) High threshold for materialized view using: Current databases rely on database administrators to manage materialized views, requiring high levels of experience; (2) Rapid performance degradation of materialized views: The utilization of materialized views decreases with changes in user query workloads, leading to reduced optimization performance; (3) Delayed creation of materialized views: The generation and optimization of existing materialized views lag behind user queries, resulting in a diminished user experience.To address these issues, this study‘s main research content and findings are as follows:1. Autonomous generation of materialized views based on value estimation. Addressing the issue of manual management of materialized views in existing databases, this study proposes an autonomous materialized view generation framework, Autoview. This framework accurately estimates the value of materialized views and automatically selects the set of materialized views with the maximum total benefit for the user. This study introduces an Encoder-Reducer model based on recurrent neural networks and a materialized view feature encoding method to accurately predict the optimization benefit of materialized views. A materialized view selection method based on reinforcement learning models is proposed, with a semantic vector-enhanced state representation vector designed to address the complex relationship representation problem among materialized views. Experiments show that the materialized views generated by Autoview outperform existing methods by 14% in terms of optimization benefit. 2. Dynamic and adaptive management of materialized views for query workloads. To address the rapid performance degradation of materialized views under dynamic query workloads, this study presents a dynamic management method for materialized views under dynamic query workloads, GnnMV. This method dynamically estimates the real-time benefit of materialized views in query workloads and efficiently maintains a high-quality set of materialized views. A dynamic query plan graph maintenance method is proposed for efficient modeling of dynamic query workloads, along with a materialized view feature aggregation function designed for rich feature vector representation. Graph neural networks are used to accurately estimate the real-time benefit of materialized views and discover potential materialized views. Experiments show that GnnMV maintains materialized views under dynamic workloads with optimization benefits 16% higher than traditional methods. 3. Event-aware materialized view forecasting. To address the issue of existing materialized view optimization lagging behind newly submitted queries, resulting in poor user experience, this study proposes an event-aware materialized view forecasting method, MVGPT. This method predicts the future query evolution trend of users and generates materialized views in advance, effectively optimizing for new queries. The construction of an event knowledge base for evolving query workloads and a rime-weighted fuzzy retrieval method are proposed to provide reasoning for trend prediction. A materialized view prediction framework based on large language models and a generative materialized view model based on Bayesian networks are proposed, generating valid and effective materialized views. Experiments show that MVGPT outperforms existing methods by 48% in optimizing new queries under evolving query workloads.