微突发流量是数据中心网络中一种常见的流量,它是一种持续时间很短的剧烈突发流量。微突发流量产生的原因主要有两个:系统的批处理策略和数据中心业务的扇入通信模式。批处理策略会将分组成批地发送到网络中,形成微突发流量。这一类的微突发流量的持续时间往往小于网络的往返时延,发送端来不及对此类微突发流量做出响应,故本文将这种微突发流量称为非响应性微突发流量。另外一方面,在扇入通信模式中,大量的发送端会向同一个接收端发送数据,当这些数据在交换机队列中聚集时,会引起队列的急剧增长,形成微突发流量。这种微突发流量的持续时间往往大于网络的往返时延,所以微突发流量的产生和发送端与其的响应过程相关,故本文将这种微突发流量称为响应性微突发流量。本文分别对响应性微突发流量和非响应性微突发流量进行了共四方面的研究,对于响应性微突发流量,主要的研究内容和成果包括:(1) 观察与分析了响应性微突发流量的动态特性。本文通过实验观测和分析了五种典型的数据中心通信场景中微突发流量的动态特性。本文发现,所有场景中,交换机队列变化的斜率均能刻画了微突发流量的动态特征,并且传统的突发流量消除方案不适用于响应性微突发流量。(2) 提出了快速的微突发流量响应机制 S-ECN。受到实验观测结果的启发,本文设计了 S-ECN 标记机制,该机制利用交换机队列变化的斜率来随机地给分组标记 ECN,从而令源端恰当地减速。实验结果表明,S-ECN 机制能有效地抑制微突发流量的产生。对于非响应性微突发流量,主要的研究内容和成果包括:(1) 分析和改进了的动态门限策略。本文分析了使用动态门限策略 (DT) 时,微突发流量造成丢包的充分条件以及空闲缓存的大小。基于分析结果,本文提出了一种 EDT 机制来尽可能地吸收微突发流量。实验和仿真测试结果表明,EDT 能吸收更多微突发流量,从而缩短业务的完成时间。(2) 分析并解决了 ECN 误标记问题。本文发现非响应性微突发流量会造成 ECN 误标记问题,故在理论上对 ECN 误标记问题进行了分析。在分析结论的启发下,本文提出了一种 CEDM 标记机制来减少 ECN 误标记。大量的实验和仿真验证了 CEDM 机制可以显著地减少 ECN 误标记引起的吞吐量损失。
Micro-burst traffic is a common traffic pattern in data center networks.It is highly intense traffic appearing in a relatively short period. There are two main causes of micro-burst traffic: fan-in communication pattern and system batching. In fan-in communication pattern, a large number of senders will send data to the same receiver. When these data aggregates in the switch queue, it will cause the fast increasing of queue length. The duration of this kind of micro-burst is often longer than the round-trip time of the network. Therefore, its dynamic behavior is related to the responses of the sender to the network. Thus, we call it responsive micro-burst traffic. On the other hand, with batch schemes, packets will be sent into the network in batches, which results in micro-burst traffic. The duration of this kind of micro-burst is often shorter than the round-trip time of the network, which means that the sender has no time to respond to such micro-bursts. Therefore, we call it non-responsive micro-burst traffic. In this thesis, we investigate both two kinds of micro-burst traffic in detail. For responsive micro-burst traffic, our studies are as follows:(1) We observe and analyze the dynamic behaviors of micro-burst traffic. By observing the fine-grained queue length in the switch, we study the dynamic behavior of micro-burst traffic in six typical data center communication scenarios. we find that the slope of queue length evolution can characterize the characteristics of micro-burst traffic. The results indicate that the traditional solutions of mitigating bursty traffic do not apply to responsive micro-burst traffic.(2) We propose a S-ECN scheme to quickly suppress the micro-burst traffic. Inspired by the experimental observations, we propose the S-ECN marking mechanism, which is a probability ECN marking scheme leveraging the slope of queue length evolution. Experimental results show that the S-ECN mechanism can effectively suppress the queue length increasing caused by micro-burst traffic.For non-responsive micro-burst traffic, our studies are as follows:(1) We analyze and improve the dynamic threshold policy. We theoretically deduce the sufficient conditions of packet dropping caused by micro-burst traffic, and quantitatively estimate the free buffer size when packets are dropped. Based on the analysis results, we propose an EDT policy to absorb micro-burst traffic as much as possible. Experiment and simulation results show that EDT can absorb more micro-burst traffic and thus reduce the completion time of small flows.(2) We analyze the ECN mismarking problem, and propose the CEDM marking scheme. We reveal the ECN mismarking problem by experiments, and theoretically analyze the problem. Based on the analysis results, we propose Combined Enqueue and Dequeue Marking (CEDM), which can mark packets more accurately. Through extensive experiments and simulations, we show that CEDM can greatly reduce throughput loss.