登录 EN

添加临时用户

基于数据驱动的数据中心网络性能建模与优化

Data-driven Performance Modeling and Optimization for Datacenter Networks

作者:王莫为
  • 学号
    2017******
  • 学位
    博士
  • 电子邮箱
    wan******com
  • 答辩日期
    2022.05.19
  • 导师
    崔勇
  • 学科名
    计算机科学与技术
  • 页码
    119
  • 保密级别
    公开
  • 培养单位
    024 计算机系
  • 中文关键词
    数据中心网络,数据驱动,性能建模,拓扑重配置,共享缓存管理
  • 英文关键词
    Datacenter networks,Data- driven,Performance modeling,Topology reconfiguration,Shared-buffer management

摘要

近年来,数据中心网络迅猛发展,为云计算服务提供了重要支撑。随着网络 应用对传输性能要求的提高,数据中心需要更好的算法、协议和系统来提升网络性能,而这依赖于对新设计准确高效的性能评估。基于数学分析和离散事件仿真的传统性能建模方法在速度、准确性和易用性等多方面均存在问题。近期,以深度学习为代表的数据驱动方法的快速发展为网络建模和性能优化提供了新的可能,它可以避免人工参与,从数据中自动学习网络实体间的复杂映射关系。然而,当前的数据驱动方法并非为解决网络问题设计,仍然面临网络系统特性各异、性能影响因素广和网络状态空间大等多方面的挑战。本文针对数据驱动方法为数据中心网络性能建模和优化带来的机遇和挑战, 以高效的网络性能建模为目标,分别从全局网络拓扑、单点网络设备和时序网络流量三个角度对影响网络性能的关键因素开展研究。本文的主要研究成果如下:1. 提出了基于深度学习的数据中心网络拓扑建模和优化方案xWeaver。为了快速构建训练样本,设计了基于分离卷积神经网络的性能模型对拓扑进行快速评估;还设计了支持领域知识嵌入的网络拓扑映射模块,离线训练后能为给定的流量需求快速生成高性能拓扑配置。基于光电交换机和大规模仿真的实验结果表明,xWeaver能够快速准确推断拓扑性能,生成的拓扑配置有效提升了网络传输性能。2. 提出了基于深度强化学习的交换机共享缓存管理方案NDT。针对缓存管理复杂度高的问题,利用交换机端口的排列对称性,设计了基于排列等变神经网络的可扩展强化学习模型。为了实现高效的训练和决策,设计了包括基于领域知识的动作编码和累积事件触发机制的两级控制方案。基于DPDK交换机原型和仿真的实验结果表明,NDT的性能普遍优于广泛使用的启发式算 法,并且具有良好的泛化能力。3. 提出了基于图神经网络的网络性能建模框架xNet。针对网络配置间的复杂关系,提出了基于异构关系图的网络系统抽象方法。基于这种高效表示,设计了支持流级别序列建模的可配置图神经网络结构。基于三个典型网络场景的实验结果表明,xNet可以对流级别性能指标进行准确时序预测,并且相比传统仿真器评估速度大幅提升。

In recent years, the rapid development of data center networks has provided important support for cloud computing services. As network applications increase their requirements on transmission performance, datacenters need better algorithms, protocols, and systems to improve network performance, which relies on accurate and efficient performance evaluation of new designs. Traditional performance modeling methods based on mathematical analysis and discrete event simulation have problems in terms of speed, accuracy, and ease of use. Recently, the rapid development of data-driven methods represented by deep learning has provided new possibilities for network modeling and performance optimization. It can avoid manual participation and automatically learn the complex mapping relationship between network entities from data. However, the current data-driven methods are not designed to solve network problems, and are still faced with many challenges, such as different network system characteristics, wide performance influencing factors, and large network state space.In response to the opportunities and challenges brought by data-driven methods for data center network performance modeling and optimization, this thesis aims at efficient network performance modeling and studies the key factors affecting network performance from three perspectives: global network topology, local network devices, and time-series network traffic. Specifically, the main contributions of this thesis are as follows:- A deep learning-based datacenter network topology modeling and optimization scheme xWeaver is proposed. In order to quickly construct training samples, a performance model based on a separate convolutional neural network is designed to quickly evaluate the topology; a network topology mapping module that supports domain knowledge embedding is also designed, which can quickly generate high-performance topology configuration for a given traffic demand after offline training. Experimental results based on optical circuit switches and large-scale simulations show that xWeaver can quickly and accurately infer topology performance, and the generated topology configuration effectively improves network transmission performance.- A deep reinforcement learning-based buffer management scheme NDT is proposed for shared-memory switches. To deal with the high complexity of buffer management problems, a scalable reinforcement learning model based on permutation-equivariant neural networks is designed by taking advantage of the arrangement symmetry of switch ports. To achieve efficient training and decision-making, a two-level control scheme including domain knowledge-based action encoding and a cumulative event-triggered mechanism is designed. Experimental results based on DPDK switch prototype and simulation show that NDT generally outperforms widely used heuristic algorithms and has good generalization ability.- A graph neural network-based network performance modeling framework xNet is proposed. To deal with the complex relationship between network configurations, a network system abstraction method based on heterogeneous relational graphs is proposed. Based on this efficient representation, a configurable graph neural network structure supporting flow-level time-series modeling is designed. Experimental results based on three typical network scenarios show that xNet can make accurate time-series predictions for flow-level performance indicators, and the evaluation speed is greatly improved compared to traditional simulators.