登录 EN

添加临时用户

异构内存架构的效率优化和系统集成

作者:李一苇
  • 学号
    2019******
  • 学位
    博士
  • 电子邮箱
    liy******.cn
  • 答辩日期
    2024.05.24
  • 导师
    高鸣宇
  • 学科名
    计算机科学与技术
  • 保密级别
    公开
  • 培养单位
    047 交叉信息院
  • 中文关键词
    异构内存架构;动态随机访问存储器;非易失性内存;近存计算;数据缓存

摘要

在大数据和深度学习的时代,不同的应用程序呈现多样化的内存需求,包括更高的带宽、更低的延迟和更高的存储密度。这些不断演化的要求,以高数据存储容量和高数据访问效率为特征,成为现代计算系统的关键瓶颈,对传统基于 DRAM 的内存系统提出严峻的挑战,并凸显了能够适应复杂应用需求的新型内存架构的必要性。高带宽内存 (HBM) 和非易失性内存 (NVM) 等新兴内存技术,以及近数据处理 (NDP) 和计算快速链路 (CXL) 等先进内存集成方案,为改善内存特性提供了有前景的途径,但也面临着根本性的限制。高带宽内存提供更高的内存带宽,但容量有限。非易失性内存技术通过更好的单元密度微缩提高存储容量,但数据访问速度较慢。NDP 让计算更靠近数据来尽量减少数据移动开销,而 CXL 则允许直接访问远端内存以扩展内存容量。这些独特的优势和局限激发了异构内存架构的发展,这种架构融合多种内存技术,以实现高容量和高效率的平衡系统。本论文旨在解决当前异构内存架构的两个关键挑战:提高利用率和系统级集成。首先,我们提出 Baryon 和 Trimma 分别优化执行效率和元数据管理。Baryon 通过内存压缩和分块优化异构内存系统,有效提升快内存的空间利用率和慢内存的带宽利用率,相比之前的缓存方案和扁平方案达到最高1.68×和2.50×的加速比。Trimma 引入一种异构内存系统中优化元数据管理的新方法,旨在减少元数据存储成本和访问延迟。与传统设计相比,Trimma 提供高达2.4×,平均1.58×的加速比。此外,我们分别提出Hydrogen 和NDPExt 将异构内存架构集成到复杂的全系统环境中。Hydrogen 创新地划分关键异构内存资源以匹配CPU和GPU工作负载在异构计算平台中的独特需求,性能比现有设计提升高达1.31×。NDPExt 面向结合NDP和CXL的系统,并使用软件定义的数据流方法和数据配置算法优化数据访问性能。NDPExt 相比前人工作实现2.4×的加速。总结来说,本论文提出了一套结构化的综合方法,旨在优化异构内存架构的效率和系统级集成,通过 Baryon、Trimma、Hydrogen 和 NDPExt 的新方法和新架构有效地应对现代计算需求的复杂性。

In the era of big data and deep learning, different applications exhibit broad memory requirements, including higher bandwidth, lower latency, and increased density. These evolving demands, characterized by high data storage capacity and high data access efficiency, have become key bottlenecks in modern computing systems, imposing critical challenges on traditional DRAM-based memory systems and highlighting the necessity for new memory architectures that could adapt to complex application demands.Emerging memory technologies like High-Bandwidth Memory (HBM) and Non-Volatile Memory (NVM), and advanced memory integration schemes like Near-Data Processing (NDP) and Compute Express Link (CXL), provide promising pathways to improve memory characteristics, but also face fundamental limitations. HBM provides substantially higher memory bandwidth but only comes with limited capacity. NVM improves storage capacity with better cell density scaling but suffers from slower data access speeds. NDP minimizes data movement overheads by bringing computation closer to data, while CXL enables remote memory to be directly accessed to extend memory capacity. These distinct strengths and weaknesses motivate the development of heterogeneous memory architectures, which blend multiple memory technologies to realize balanced systems with both high capacity and high efficiency.This thesis addresses two key challenges of current heterogeneous memory architectures: efficiency improvement and system-level integration. First, we propose Baryon and Trimma to enhance execution efficiency and metadata management, respectively. Baryon optimizes heterogeneous memory systems through memory compression and sub-blocking, effectively improving the fast memory space utilization and the slow memory bandwidth utilization. It achieves up to 1.68x and 2.50x speedups compared to previous designs. Trimma introduces a novel approach to optimize metadata management in heterogeneous memory systems, aiming to reduce both the metadata storage cost and the lookup latency. It offers up to 2.4x and on average 1.58x speedups over conventional designs. Furthermore, we propose Hydrogen and NDPExt to integrate heterogeneous memory architectures into complex full-system environments. Hydrogen innovatively partitions critical heterogeneous memory resources to match the distinct demands of CPU and GPU workloads in a heterogeneous computing platform, outperforming existing designs by up to 1.31x. NDPExt targets a system that combines NDP and CXL, and uses software-defined data streams and data placement algorithms to optimize data access performance. It achieves 2.4x speedups compared to previous work. In summary, this thesis proposes a structured and comprehensive approach to enhance the efficiency and system-level integration of heterogeneous memory architectures, leveraging novel approaches and architectures like Baryon, Trimma, Hydrogen, and NDPExt to address the complexities of modern computing demands effectively.