登录 EN

添加临时用户

基于非易失存储器件的高可靠高能效存算电路

Non-Volatile-Memory-Based High-Energy-Efficiency, High-Reliability Computing-in-Memory Circuits

作者:李旻谚
  • 学号
    2021******
  • 学位
    硕士
  • 电子邮箱
    lmy******.cn
  • 答辩日期
    2024.05.20
  • 导师
    李学清
  • 学科名
    电子科学与技术
  • 页码
    78
  • 保密级别
    公开
  • 培养单位
    023 电子系
  • 中文关键词
    神经网络加速器;非易失存储器;存内计算;高能效;高可靠
  • 英文关键词
    neural network accelerators;non-volatile memory;computing-in-memory;high energy efficiency;high reliability

摘要

当今,随着数据的快速增长,数据密集型智能应用愈发广泛,例如自动驾驶技术、大语言模型、多模态学习、推荐系统等,其中深度学习大语言模型的参数量已经达到千亿量级,庞大的处理数据对计算硬件的性能,功耗以及可靠性提出了更高的要求。为实现智能算法的高效硬件加速,本文基于非易失存储器设计了两种高可靠高能效的存内计算电路。 第一项研究重点关注了基于非易失存储器的存内计算电路功耗高,延时长的问题,该问题弱化了非易失存储器件读写快漏电低等优势。本工作提出了基于铁电晶体管的低功耗低延时存内逻辑计算电路,采用电压域全动态逻辑计算、计算后直接写回、计算充电优化三项技术。所有逻辑操作的平均能量均小于10fJ,最差延时小于1ns,相比于其他基于非易失器件的存内计算工作减少了近百倍的能量和延时。 第二项研究主要解决了存内计算的可靠性问题,由于非易失存储器件存在器件偏差、写入偏差、读干扰等非理想因素,存内计算的感测裕度、吞吐量,以及能量效率这些关键性能的提升受到了限制。本工作利用存储器件、单元电路以及阵列电路的多层次协同优化,基于电阻式存储器提出局部恢复电路模块减少器件非理想因素对计算准确率的影响,提升存内计算的可靠性和能效。相比于其他存内计算工作,吞吐量在同等计算精度下能够提升3~15倍,能效可以提升2~10倍。研究同时提出动态参考边界适应技术,降低了计算过程中器件状态漂移对计算精度的影响,和传统的写入验证方法相比,能量和延时分别减少大约100倍和1.3倍。 本研究优化了基于非易失存储器存内计算的功耗和延时,提出了基于电压域以及直接写回的计算方式,真正意义上实现了存储器内的逻辑计算,为后续低功耗低延时的存内计算提供了研究基础;同时,针对其中计算可靠性问题,本工作提出了基于局部恢复模块的校准技术,有效减少了器件偏差等非理想因素对计算精度和并行度的影响,这项工作也为高可靠高能效的存内计算提供了技术支撑。两项研究验证说明了跨层次协同优化在高性能高能效神经网络加速中的可行性和重要意义。

Nowadays, as data continues to proliferate, data-intensive intelligent applications such as self-driving technology, large language models, multimodal learning, and recommendation systems have become increasingly pervasive. In particular, the parameters of large language models in deep learning have reached the scales of hundreds of billions. Massive processing data has placed higher demands on hardware performance, power, and reliability. To realize effective hardware accelerations on intelligent algorithms, this thesis proposes two circuit designs for high-reliability, high-energy-efficiency NVM-memory (NVM)-based computing-in-memory (CIM). The first design focuses on the problem of high power consumption and latency in NVM-based CIM circuits, which undermines the advantages of high read speed, high write speed, and low leakage in NVM devices. This work proposes a low-power, low-latency ferroelectric-FET-based logic-in-memory circuit with fully dynamic voltage-domain logic computation, direct write-back after computation, and charging energy optimizations. The proposed logic operations exhibit an average energy of less than 10fJ and a worst latency of less than 1ns. This saves about a hundred times energy and latency compared with other NVM-based CIM works. The second design tackles the computing reliability issue in CIM. Due to the non-ideal factors of device-to-device variations, write variations, and read disturb in NVM devices, improving the sensing margin, throughput, and energy efficiency in CIM has been restricted. By co-optimizing memory devices, cell circuits, and array circuits, this work proposes a resistive-RAM-based local recovery unit. This design reduces the impact of non-ideal factors in NVM on computing accuracy and improves CIM's reliability and energy efficiency. Compared with other CIM works, the throughput is improved by 3x~15x under the same computing accuracy, while the energy efficiency is enhanced by 2x~10x. Moreover, a dynamic boundary adaption approach is proposed to mitigate the effects of device state drifting on computing accuracy. Compared with the conventional write-and-verify method, the energy and latency are reduced by about 100x and 1.3x, respectively. To optimize the power and latency in NVM-based CIM, this research proposes a voltage-domain-based computing approach with direct write-back, which realizes real logic-in-memory computation and provides foundations for the subsequent low-power low-latency CIM. Meanwhile, to address CIM's reliability issue, this work proposes a local-recover-unit-based calibration technique, which reduces the impact of device non-ideal factors such as device-to-device variations on computing accuracy and parallelism. This work also provides technical support for high-reliability high-energy-efficiency CIM. Both research has verified and demonstrated the feasibility and significance of cross-layer co-optimizations in high-performance high-reliability neural network accelerations.