神经网络计算在诸多领域取得突破,引发新一轮信息技术革命,但也对硬件载体的性能和能效提出更高的需求。基于模拟型阻变存储器的存算一体系统有效缓解了存储墙问题,有望实现高性能、低功耗计算。阻变存储器的可靠性退化问题是影响存算一体系统准确率的关键因素,然而,前期的可靠性研究多是面向存储应用,尚缺乏面向神经网络应用的可靠性研究。阻变存储器可靠性研究存在的挑战包括:不同于具有较高退化容忍度的传统存储应用,神经网络应用中的系统准确率对器件可靠性退化引起的电导变化更为敏感,器件可靠性退化的影响程度尚待澄清;网络训练中电导的更新方式与存储应用中的循环擦写方式不同,前期的可靠性表征方法难以适用于神经网络中。针对上述问题,本文围绕面向神经网络的模拟型阻变存储器的可靠性评估和表征方法开展研究,取得的创新成果如下:1. 从神经网络计算的应用需求出发,建立了从器件到系统的跨层次可靠性分析与评估框架,在框架中提出了面向神经网络的可靠性的评估方法、表征方法及可靠性影响的量化方法,循环耐久性测试效率提升超过700倍,为多方位评估模拟型阻变存储器可靠性退化对系统准确率的影响提供了方法指导。2. 围绕模拟型阻变存储器的数据保持特性,针对缺乏保持退化模型的问题,从统计的角度建立了适用于多阻态、多温度和多阵列形态的阵列级保持特性退化模型,利用所提出的可靠性影响量化方法,评估了保持退化特性对系统准确率的影响,确定了特定神经网络应用离线训练对器件数据保持特性的最低需求,并提出了相关优化方法。3. 针对现有的循环耐久性表征方法难以模拟在线训练时权重更新的问题,本文提出了小步长增量阻变方法,在模拟型阻变存储器上实现了1011次电导更新,比传统二值存储的耐擦写次数典型值高出5个数量级,能够满足在线训练需求。进一步通过阶段式采样模拟阻变曲线,建立了循环耐久性与其耦合的非线性和开关比的关系模型,量化了循环耐久性对准确率的影响,证明了循环耐久性的耦合效应是导致在线训练准确率损失的直接原因。本文提出的循环耐久性表征方法被Tektronix公司采纳并开发成标准测试模块,已推广商用。上述研究工作验证了面向神经网络的跨层次可靠性评估方法的有效性,为实现高可靠的神经网络加速芯片奠定了基础。
Neural networks have made significant breakthroughs in many aspects, which brought a new round of revolution in information technology. But the demand of the hardware system with high performance and energy efficiency has been enhanced. Analog resistive random access memory (RRAM) based computing-in-memory system alleviates the memory wall, which is expected to achieve high-performance and low-power computing. However, the reliability degradation issues of analog RRAM are the key factors leading to the accuracy loss in neural networks. However, the early reliability researches are mostly aimed at memory applications, but the study on reliability influence of analog RRAM in neural network is missing. Challenges in RRAM reliability research include: Unlike traditional memory applications with high reliability degradation tolerance, the accuracy in neural network applications is more sensitive to conductance changes caused by reliability degradation, and the influence degree of reliability degradation remains to be clarified; Besides, the conductance update method in neural network online training is different from the cyclic erasing method in memory applications, and the traditional electrical characterization method cannot meet the requirements of reliability research in neural network. To deal with the above limitations, this thesis focuses on the reliability research method of analog RRAM in neural network, and the innovations in this research work are as follows:1. This thesis established a cross-level reliability analysis and evaluation framework from device to systems considering the application requirement of neural network, and the evaluation method, characterization and physical mechanism analysis methods and the reliability impact quantification method are proposed in this framework, the endurance characterization efficiency has been increased by more than 700 times, which provided a guideline for evaluating the influence of the reliability degradation of analog RRAM on computing accuracy loss of neural network. 2. Focusing on the retention characteristics of analog RRAM, aiming at the lack of retention degradation models, this thesis established a series of array-level retention degradation models suitable for multi-resistance, multi-temperature and multi-level forms from a statistical point of view. Furthermore, by mapping the retention model into the neural network model, the impact of retention degradation on the accuracy loss can be quantified. Therefore, according to the measured data, the minimum requirements for retention characteristics of analog RRAM of a specific computing application offline training scenario has been determined, and a related optimization method was proposed.3. Aiming at the problem that the existing endurance characterization methods make it difficult for RRAM devices to mimic the weight update process during online training, this thesis proposed an incremental switching method. According to this method, the analog RRAM achieved 1011 conductance updates. And it was 5 orders of magnitude higher than the typical endurance cycle number of conventional binary RRAM. In this case, the devices can meet the endurance requirements of online training. Further, by staged sampling the analog switching curves, this thesis established the relationship model of endurance degradation and its coupling nonlinearity and on/off ratio. With the endurance model, the impact of endurance degradation on the accuracy loss has been quantified, which proved that the coupling effect of endurance was the direct cause of accuracy loss in online training.The proposed endurance characterization method in this thesis has been adopted by Tektronix and developed into a standard test module, which has been commercialized. The research works on retention and endurance has verified the efficiency and practicability of the neural network-oriented cross-level reliability analysis and evaluation methods. This method can lay the foundation for achieving the neural network accelerators accommodating to the reliability degradation.