神经网络加速器的计算架构及存储优化技术研究

Architecture Design and Memory Optimization for Neural Network Accelerators

作者：涂锋斌

学号

2013******
学位

博士
电子邮箱

tuf******com
答辩日期

2019.05.30
导师

魏少军
学科名

电子科学与技术
页码

142
保密级别

公开
培养单位

026 微纳电子系

中文关键词

神经网络加速器, 可重构计算, 计算模式, 计算架构, 存储优化

英文关键词

Neural Network Accelerator, Reconfigurable Computing, Computation Pattern, Computing Architecture, Memory Optimization

摘要

近年来，通用处理器的性能增长逐渐达到瓶颈，难以应对不断涌现的新兴应用场景需求。神经网络算法的蓬勃发展为新型计算系统架构的探索提供了广阔空间。“通用处理器 + 神经网络加速器”的异构计算系统，和以神经网络加速器为中心的人工智能计算系统，分别为通用计算和人工智能领域的专用计算带来了性能和能效更优的解决方案。神经网络加速器在这两类系统中的地位尤其重要。本文指出神经网络加速器设计在计算模式、计算架构和存储优化方面尚存在不足：需要建模计算模式与执行目标的数学关系，以便在不同的执行目标和网络结构下选择最优的计算模式；需要支持动态重构逻辑的计算架构，在网络内的各层执行时能灵活调整其计算模式，达到最优的执行效果；需要引入高密度存储器解决访存问题，而同时要对其带来的额外开销进行优化。本文针对以上需求，总结出两套神经网络加速器的优化设计方法：基于“计算模式-动态重构”的计算架构设计方法，和基于“器件特性-容错能力”的存储优化方法。以此为指导思想，本文主要完成三项研究工作：(1)本文设计了面向通用神经网络近似的神经网络计算架构RNA。RNA架构以降低计算延迟为执行目标，通过动态地重构硬件资源解决神经网络拓扑结构与固定的硬件资源间潜在的失配问题，相比于传统通用计算系统，可获得572倍的加速器性能加速比，和7.9倍的应用级加速比。(2)本文设计了面向专用人工智能领域的神经网络计算架构DNA。DNA架构以提高计算吞吐和能效为执行目标，通过动态地重构硬件资源实现混合数据复用模式和并行卷积映射方法，可获得高达93%的计算资源利用率和3.4倍的计算吞吐，相比于国际顶尖工作有1到2个数量级的系统能效提升。以DNA计算架构为基础的人工智能计算芯片Thinker已通过流片验证。(3)本文提出了基于数据保持时间的神经网络存储优化框架RANA。RANA框架利用神经网络算法的容错性和计算过程中的数据暂存性，在神经网络加速器中引入eDRAM高密度存储同时几乎不需要对其刷新，可减少41.7%的片外访存和66.2%的整体系统能耗。本文的三项研究工作与两套优化设计方法相辅相成。研究工作本身均已进行充分的实验验证，具备很高的实用价值。优化设计方法不仅为研究工作中提供了有力支撑，对神经网络加速器架构未来的研究方向同样具有指导意义。

In recent years, general-purpose processors can't keep increasing its performance for emerging applications, under the limited power budget. Neural network's recent development opens a new world for exploring new computing systems. "General-purpose processor + neural network accelerator" based heterogeneous computing system, and neural network accelerator centric artificial intelligence (AI) computing system, are two promising solutions that enable higher performance and efficiency, for general-purpose computing and AI domain specific computing. Neural network accelerator design is quite important in the above two computing systems.This dissertation points out that neural network accelerator design needs improvements in computation pattern, computing architecture and memory optimization: Modeling the relationship between computation patterns and execution objectives is necessary, so the best computation pattern can be always selected when the objective or neural network changes; A dynamically reconfigurable computing architecture is required to adjust the computation pattern for each layer of a neural network, to optimize the execution objective; High-density memory is needed to deal with the memory bottleneck, while the overhead should alleviated to maximize the benefits of increased memory capacity.This dissertation proposes two design optimization principles for neural networks accelerators: Computation Pattern-Dynamic Reconfigurability based architecture design, and Device Characteristics-Error Resilience based memory optimization. Three works are completed under the guidance of the above two principles:(1) Neural network computing architecture RNA for general-purpose neural approximation: RNA takes minimizing computing latency as the objective. RNA can dynamically reconfigure its architecture to solve the mismatch problem between diverse network topologies and fixed hardware resources. RNA achieves accelerator speedup of 572x and application speedup of 7.9x.(2) Neural network computing architecture DNA for AI applications: DNA takes maximizing throughput and energy efficiency as the objectives. DNA can dynamically reconfigure its architecture to realize the proposed hybrid computation pattern and parallel output oriented mapping method. DNA achieves resource utilization of 93%, throughput improvement of 3.4x and 1~2 orders higher energy efficiency than state-of-the-art works. Thinker, a DNA-based AI chip has been fabricated in 65nm CMOS technology.(3) Retention-aware memory optimization framework RANA: RANA exploits neural network's error resilience and short data lifetime to enhance hardware's tolerance to less eDRAM refresh. Owing to RANA, a neural network accelerator can use eDRAM to improve its on-chip buffer capacity with almost no refresh overhead, which saves 41.7% off-chip memory access and 66.2% system energy consumption.This dissertation is highlighted by the three works and two design optimization principles. All the three works have been carefully evaluated to prove their practicability. Supported by the works, the two proposed principles have shown their value in guiding neural network accelerator design, and will also play a significant role in future neural network accelerator development.

概览页

神经网络加速器的计算架构及存储优化技术研究

Architecture Design and Memory Optimization for Neural Network Accelerators

摘要

请选择登录入口

添加临时用户

概览页

神经网络加速器的计算架构及存储优化技术研究

Architecture Design and Memory Optimization for Neural Network Accelerators

摘要

国内学位论文

国外学位论文

请选择登录入口