登录 EN

添加临时用户

在网计算应用加速与服务框架研究

Research on Application Acceleration and Service Framework for In-Network Computing

作者:徐文佺
  • 学号
    2017******
  • 学位
    博士
  • 电子邮箱
    ker******com
  • 答辩日期
    2023.05.13
  • 导师
    刘斌
  • 学科名
    计算机科学与技术
  • 页码
    145
  • 保密级别
    公开
  • 培养单位
    024 计算机系
  • 中文关键词
    在网计算, 模糊缓存, 运行时, 异构可编程网络, 程序自动化编排
  • 英文关键词
    In-network computing, Approximate cache, Runtime, Programmable network, Program automatic orchestration

摘要

可编程数据平面(Programmable Data plane, PDP)技术允许用户自由定义和使用网络设备资源,催生了在网计算(In-Network Computing, INC)。INC通过高速网络设备来卸载计算任务,有效提升了应用的吞吐和时延性能。其中,模糊缓存技术是一种将计算结果保存并通过查找代替计算来实现应用加速的技术,但现有基于软件方法的缓存设计性能受限,利用INC实现在网模糊缓存有望极大提升缓存性能。然而,当前网络设备能力限制以及INC模式固有缺陷,开展此类研究面临以下挑战:(1)模糊缓存依赖于复杂的设计和算法,无法简单实现于资源和能力均受限的硬件网络设备;(2)不同网络设备能力和资源特性均不同,INC部署需要选择合适的硬件设备并进行适配化设计,但难度很大;(3)由于依赖网络设备,现有INC开发与网络运维紧密耦合,需要向用户完全开放可编程网络设备,同时用户需要熟练掌握设备、拓扑和网络协议等运维内容,并且不同用户间程序互相影响,不利于INC良性发展。针对上述问题,本文的研究内容与贡献如下:1. 高效网边模糊缓存单元。针对主机端模糊缓存性能低问题,本文提出网边硬件模糊缓存技术。该缓存采用基于三态地址内容缓存器(Ternary Addressable Content Memory, TCAM)与近存计算(Near-Memory Computing, NMC)的硬件架构,其中TCAM用于模糊匹配而NMC用于结果可重用验证;通过编码聚合等算法构建高效特征模糊匹配方案,并利用FPGA构建原型系统。实验结果显示该系统能实现90%以上命中率和准确率,时延比主机端模型推理计算降低三个数量级。 2. 运行时网内模糊缓存系统。针对多用户动态在网模糊缓存需求,本文提出可立即部署的网内模糊缓存系统。为克服现有可编程交换机不支持运行时编程的限制,本文提出统一模糊缓存抽象与匹配表单元化结构设计。并通过一系列优化设计以适配交换机硬件。实验结果表明其可同时支持上百万个终端的缓存请求。 3. 在网计算即服务框架。针对INC开发与部署低效问题,本文面向应用开发者首次提出INC即服务框架ClickINC ,将INC开发与设备及网络运维操作解耦。通过构建统一INC抽象和便捷化编程语言接口,对用户屏蔽网络设备、拓扑和网络协议等操作,用户只需专注INC部分设计。通过程序自动化编排工具与编译器设计,本文实现INC程序自动、高效地部署于网络。实验结果表明,相比于前沿工作,ClickINC编程效率提升10倍以上,编排效率提升1000倍以上。

Programmable Data Plane (PDP) technology allows users to freely define and use network device resources, giving rise to In-Network Computing (INC). INC offloads computation tasks through high-speed network devices, effectively improving the throughput and latency performance of host-side applications. As a typical representative, the new approximate cache technology preserves and reuses computation results, accelerating application through cache lookup while the software-based design limits the cache performance. Inspired by INC, the in-network approximate caching has great potential for improving cache performance. However, due to the limitations of network device capabilities and inherent flaws of current INC in data center, the challenges are as follows: (1) due to the complex design and algorithm, approximate cache cannot be easily implemented in hardware with limited resources and capabilities; (2) to deploy INC on network devices with different capabilities and resource, carefully selection of suitable devices and hardware adaptation are required but knotty; (3) INC development is tightly coupled with network operations on the network device, users require full access to programmable network devices as well as knowledge of device, topology, and network, with INC programs unisolated for different users, which impedes the INC development. To address these issues, this dissertation proposes the following content and contributions:1. Efficient edge-side approximate cache unit. To address the low performance of host-side software-based approximate cache, this dissertation proposes an efficient hardware fuzzy caching unit at the edge of the network. This cache adopts a hardware architecture based on Ternary Addressable Content Memory (TCAM) and Near-Memory Computing (NMC), where TCAM is used for approximate matching and NMC is used for result reusable verification. An efficient feature approximate matching scheme is constructed through core algorithms such as code aggregation, and a prototype system is built using an FPGA card. Experimental results show that the fuzzy cache unit can achieve over 90% hit rate and accuracy rate with an acceleration ratio of over 1000 compared to the inference computation on servers.2. Runtime in-network approximate cache system. To satisfy the dynamic needs of in-network approximate cache from multiple users, this dissertation proposes an immediately in-network approximate cache system based on programmable switches. To overcome the limitation that existing programmable switches do not support runtime programming, this dissertation proposes a unified fuzzy cache abstraction and unitized matching table structure. Through a series of adaptive designs, this cache overcomes switch resource limitations. Experimental results show that it can support cache requests from millions of terminals simultaneously.3. INC as a Service framework. To address the inefficiency of INC development and deployment, this dissertation proposes ClickINC, the INC as a Service framework, which decouples INC development from network operation. This dissertation constructs a unified INC abstraction and convenient programming language interface, which shields users from network devices, topology, and network operations, making users only focus on INC design. This dissertation designs automated orchestration tools and compilers to automatically and efficiently deploy INC programs in the network. Experimental results show that ClickINC can improve programming efficiency by over 10 times, and orchestration efficiency by over 1000 times than the state-of-the-art works.