登录 EN

添加临时用户

分布众核类脑芯片的时空映射理论与硬件架构协同设计

作者:王松
  • 学号
    2021******
  • 学位
    博士
  • 电子邮箱
    956******com
  • 答辩日期
    2024.05.24
  • 导师
    裴京
  • 学科名
    仪器科学与技术
  • 保密级别
    公开
  • 培养单位
    013 精仪系
  • 中文关键词
    时空密度;时空资源调度;物理映射;时空映射极限;硬件架构

摘要

类脑计算芯片借鉴人脑处理信息机制,融合现有计算科学技术,为发展人工通用智能可以提供一种高效算力平台。类脑计算架构芯片具有众核分布、高并行度、低功耗、存算一体等特性,并且其内部的众核之间存在多种同步、异步以及混合机制,这导致映射存在着高度的不确定性。随着算法与类脑系统的相互融合,算法的多样性与类脑的复杂性不断制约着类脑芯片算力的发挥。目前尚缺乏从单个网络的资源调度到物理映射以及芯片架构之间效率的研究。本文旨在提高类脑计算架构芯片的资源利用效率,包括神经网络时空资源与硬件时空资源之间调度不平衡性所引起的低效率;与时空资源调度相关的物理映射NP问题所引起的低效率;优化硬件架构使其支持高效映射模型。围绕着三个方面的科学问题,本文的主要成果和创新点包括:1. 针对神经网络对存算资源需求的无限性与类脑计算架构时空资源有限性之间不平衡性所引起的低效率问题,本文对于网络的多个任务构建时空密度映射的数学理论模型。基于时空密度映射提出数据共享的映射方法、多层耦合调节密度的映射方法、不均匀拆分的映射方法。在微观层面的单任务时空资源调度上,分别提出了非循环式、循环式、混合式时空资源调度模型;并提出了单任务下的时空映射极限理论。并分析得出循环式资源调度能够实现单任务的时空映射效率极限。基于以上的理论模型方法本文做了大量的实验分析与对比。2. 针对资源调度所分配计算core簇向2D平面部署时存在的物理映射NP问题,基于上述三种时空资源调度模型研究相对应的物理映射算法(4D到2D分层算法、汉密尔顿回路算法、混合型算法)。提出单任务物理映射在功耗、时间尺度上的映射极限,并证明汉密尔顿回路算法能够实现该物理映射极限。通过实验证明循环式资源调度模型与之匹配的汉密尔顿回路映射优于其他调度模型。3. 针对支持高效映射的类脑芯片计算单元的结构及存算比例优化问题,本文提出了高维MAC阵列模型,并基于3D MAC阵列的维度尺寸相应构建存储器容量、高维MAC阵列与映射效率三者之间的模型。结合本文提出循环式资源调度,相应优化设计类脑众核架构芯片原语指令、智能信息存储机制、地址系统。最后编写仿真系统并进行映射对比,初步验证该硬件架构的优越性。关键词:时空密度;时空资源调度;物理映射;时空映射极限;硬件架构

Brain-inspired computing chips draw on the information processing mechanisms of the human brain and integrate existing computational science technologies to provide an efficient computational platform for the development of artificial general intelligence. These chips feature characteristics such as distributed multi-core architecture, high parallelism, low power consumption, and integration of storage and computation. The various synchronous, asynchronous, and hybrid mechanisms among the cores inside these brain-inspired computing chips result in high uncertainty in mapping. As algorithms and brain-inspired systems continue to integrate, the diversity of algorithms and the complexity of brain-inspired systems continually constrain the computational power of brain-inspired chips. Currently, there is a lack of research on efficiency from individual network resource scheduling to physical mapping and efficiency between chip architectures. This article aims to improve the resource utilization efficiency of brain-inspired computing architecture chips, including the inefficiency caused by the imbalance between the spatiotemporal resources of neural networks and hardware, the inefficiency caused by the NP problem of physical mapping related to spatiotemporal resource scheduling, and optimization of hardware architecture to support efficient mapping models. The main achievements and innovations of this article around three scientific issues include:1. Addressing the inefficiency problem caused by the imbalance between the infinite resource demand of neural networks and the limited spatiotemporal resources of brain-inspired computing architecture, this article constructs a mathematical theoretical model of spatiotemporal density mapping for multiple tasks of the network. Based on spatiotemporal density mapping, mapping methods such as data sharing, multi-layer coupling adjustment density mapping, and uneven splitting mapping are proposed. At the micro level of single-task spatiotemporal resource scheduling, non-cyclic, cyclic, and hybrid spatiotemporal resource scheduling models are respectively proposed, and the spatiotemporal mapping limit theory under single task is proposed. It is analyzed that cyclic resource scheduling can achieve the spatiotemporal mapping efficiency limit of single task. Based on the above theoretical model methods, this article conducts a large number of experimental analyses and comparisons.2. Addressing the NP problem of physical mapping when allocating computing core clusters to deploy on a 2D plane based on the above three spatiotemporal resource scheduling models, corresponding physical mapping algorithms (4D to 2D layering algorithm, Hamiltonian circuit algorithm, hybrid algorithm) are studied. The mapping limit in power consumption and time scale of single-task physical mapping is proposed, and it is proved that the Hamiltonian circuit algorithm can achieve this physical mapping limit. Through experiments, it is demonstrated that the Hamiltonian circuit mapping matched with cyclic resource scheduling model outperforms other scheduling models.3. Addressing the optimization problem of the structure and storage-computation ratio of brain-inspired chip computing units that support efficient mapping, this article proposes a high-dimensional MAC array model and constructs a model of storage capacity, high-dimensional MAC array, and mapping efficiency based on the dimensional size of 3D MAC array. Combined with the cyclic resource scheduling proposed in this article, corresponding optimization designs for brain-inspired multi-core architecture chip primitives instructions, intelligent information storage mechanisms, and address systems are proposed. Finally, a simulation system is developed for mapping comparison to preliminarily verify the superiority of this hardware architecture.Keywords: spatio-temporal density; spatio-temporal resource scheduling; physical mapping; spatio-temporal mapping limit; hardware architecture