登录 EN

添加临时用户

面向可重构处理阵列的编译映射技术研究

Research on Compilation and Mapping Technologies for Reconfigurable Architectures

作者:谷江源
  • 学号
    2014******
  • 学位
    博士
  • 电子邮箱
    guj******.cn
  • 答辩日期
    2020.07.06
  • 导师
    尹首一
  • 学科名
    电子科学与技术
  • 页码
    145
  • 保密级别
    公开
  • 培养单位
    026 微纳电子系
  • 中文关键词
    可重构处理器,循环映射优化技术s,数据传输感知映射,双电压感知映射,老化应力感知映射
  • 英文关键词
    CGRAs, Loop Mapping Optimization Technique, Data-Transfer Aware Mapping, Dual-Vdd Aware Mapping, Aging-Stress Aware Mapping

摘要

现今,大数据、云计算和人工智能等新型应用迅速兴起,其往往需要更多的运算资源来处理更多的数据和运算。而且随着工艺尺寸进入深亚微米时代以来,大规模集成电路面临着巨大的功耗和能效上的挑战。可重构计算架构是一种兼具高效率、高灵活和高能效的典型空间并行计算架构,综合了通用处理器GPP高灵活性和专用集成电路ASIC高能效的优势,具有非常丰富的片上存储、互连和计算等硬件资源。可重构计算处理架构CGRA往往具有更低的功耗开销和更高的能量效率,能够实现计算任务更快速的动态配置,从而具有更广泛的应用前景和市场需求。而当前限制可重构计算芯片的实用性的最大挑战之一就是设计一个高效易用的可重构计算编译器。而为数据和计算密集型应用中的循环任务提供一个更高效、更低功耗和更长久高质量执行方案的映射算法就是其中重要的研究方向。针对以上研究背景,本论文主要围绕着可重构处理器的编译映射技术展开研究,详细讨论并给出了一个完整的可重构计算处理器的硬件架构仿真及其软件编译映射实现的工具链框架;并重点从“高效率、低功耗、更长久”的角度出发,对循环任务在可重构处理器CGRA上调度映射问题展开了相关优化研究工作。首先,从“高效率”的角度,本论文提出了一种支持片上数据传输感知的循环映射算法,能够将循环任务中那些数据依赖关系有效地映射到不同的硬件资源上去,比如PE处理单元、PEA共享全局寄存器GRF、PE独享的局部寄存器LRF和片上共享存储Memory等,从而获得最佳的执行性能,并同时取得11倍左右的映射编译速度的提升。其次,从“低功耗”的角度,通过结合可重构处理器上空间动态$Dual V_{dd}$切换的低功耗技术,本论文提出了一种支持片上电压感知的循环映射优化算法,在保证获得最佳循环执行性能的同时,尽量将短延时运算操作工作映射在低电压驱动的PE单元上执行,从而降低循环执行的功耗开销,能够让可重构处理器上循环任务执行的能量效率平均提升1.41倍左右。最后,从“更长久”的角度,本论文提出了一种支持片上老化感知的循环映射优化算法,其采用一种两级的Intra-Kernel和Inter-Kernel老化应力优化策,来缓解可重构处理器计算阵列上老化应力累积的问题,从而降低PE单元上最大老化应力累积和使不同PE单元上的老化应力分布更均匀,最终使可重构处理器计算阵列的硬件计算寿命延长3.16倍左右。

Today, more and more new applications, such as big data, cloud computing and artificial intelligence (AI), usually require more resources to process more data and to perform more computations. Moreover, when the process technology scales down into the ultra-nanometers era, the very large-scale integration circuits are facing significant challenges of both power consumption and energy-efficiency. But encouragingly, coarse-grained reconfigurable computing architectures (CGRAs) are just joint both the high flexibility of general-purpose processors (GPPs) and the high energy-efficiency of the application-specific integrated circuits (ASICs). They are regared as a typical spacially paralleled comuputing architectures, which consist of numerous on-chip computing, storage and routing resources. CGRAs usually own these features of lower power consumptions, higher energy-efficiency and shorter reconfiguration overhead. So they can make applications and tasks to be dynamically reloaded and reconfigured more rapidly and easily, which brings a broad application foreground and meets the requirements of markets.For those research backgrounds mentioned above, this thesis are mainly surrounded by the compilation and mapping technology for reconfigurable computing architecutres. It first discusses reconfigurable hardware architecture design, and provides a complete a tool-chain framework of compilation, mapping and simulation for reconfigurable computing. Then, it focuses on mapping the loop tasks on the reconfigurable architectures of CGRAs from these three aspects of "high efficiency, low power consumption and longer life-time". (1) From the perspective of high efficiency, a on-chip data-transfer aware loop mapping algorithm for CGRAs is proposed out in this thesis, which can mapping those data dependencies of loop tasks on different on-chip hardware resources, such as processing elements (PEs), PEA-Shared global register (GRF), PE-exclusive local register (LRF) and on-chip shared storage memory. Experimental results show that it can both keep the optimal loop execution performance and accelerate the speed of mapping compilation by about 11 times. (2) From the perspective of low power consumption, with the spatial $Dual-V_{dd}$ assignment technique, a on-chip $Dual-V_{dd}$ aware loop mapping algorithm for CGRAs is then designed in this thesis. It tries to make those operations of short delays scheduled and mapped to those $V_{ddL}$-driven PEs as possible, which helps to reduce power consumption of loops execution. Finally, it can averagely bring about 1.41 times improvement for energy-efficiency while keep the best performance during loop execution on CGRAs. (3) From the perspective of longer life-time, a on-chip aging-stress aware loop mapping algorithm for CGRAs is also presented in this thesis, which adopts a two-stage Intra-Kernel and Intel-Kernel aging stress optimization strategy to mitigate the aging issues on CGRAs. It tries to reduce the maximum aging stress accumulation on PEs and to make aging stress distribution more even among different PEs unit, which can eventually imporve the life-time of computing arrays on CGRAs by about 3.16 times.