登录 EN

添加临时用户

面向CKKS全同态加密算法的硬件加速器关键技术研究

Research and implementation of key technologies for CKKS fully homomorphic encryption hardware accelerator

作者:龚新胜
  • 学号
    2021******
  • 学位
    硕士
  • 电子邮箱
    109******com
  • 答辩日期
    2024.05.14
  • 导师
    魏少军
  • 学科名
    电子信息
  • 页码
    63
  • 保密级别
    公开
  • 培养单位
    026 集成电路学院
  • 中文关键词
    全同态加密;硬件加速;隐私计算;软硬件协同设计
  • 英文关键词
    Fully homomorphic encryption;Hardware acceleration;Privacy computation;Software Hardware co-design

摘要

随着大数据、云计算、人工智能等数据驱动应用技术的发展,生产力水平得到极大提高,但由于数据流通过程中造成的用户隐私泄露与数据安全问题也日益严重。全同态加密作为一种支持在密文态上实现完备计算的密码学方案,是隐私计算体系中的核心技术。CKKS 算法是第三代全同态加密方案,通过引入近似计算降低了同态操作的复杂度,并已在隐私保护机器学习领域得到广泛应用。但是,CKKS 算法仍然面临计算复杂度要比明文计算高4 个数量级以上,且计算通路与数据存储访问通路存在严重不匹配的问题。针对上述问题,本论文将从硬件架构、调度机制与电路实现等方向开展面向CKKS 全同态加密算法的硬件加速技术研究。首先,提出了一种细粒度向量化硬件架构,采用静/动态结合调度与流数据处理机制,与最新工作对比,将硬件利用率提高了22.8%。其次,提出了一种基转换算法的软硬件协同设计方法,相比于直接实现,在避免Montgomery 域转换计算的同时,将模乘运算降低了33%,访存次数降低83.3%。最后,提出了一种细粒度向量化架构下的同态算子电路设计方法,采用数据在线生成、无冲突访问策略和地址映射分解等方法,将NTT 访存次数与自同构存储开销分别降低了50.0% 和93.9%。为验证上述设计,本论文分别在Xilinx VCU-128 FPGA 和TSMC 28 nm 工艺上对设计进行了验证与性能评估。在并行度达到256 的情况下,FPGA 的综合仿真频率达到300 MHz。在HELR 和Bootstrpping 应用的性能分别提升了1.59 和1.48倍。ASIC 仿真结果表明,针对CKKS 中的NTT 访存次数与自同构存储开销分别降低了 66.6% 和 74.0%。整体实现的计算密度和能量效率分别提升了 66.32 倍和146.01 倍。

With the development of data-driven application technologies such as big data, cloud computing and artificial intelligence, the productivity level has been greatly improved, but the user privacy leakage and data security problems caused by the data sharing have become increasingly serious. As a set of cryptographic schemes that supports complete computation on ciphertext, homomorphic encryption is the core technology of privacy computing system. CKKS algorithm is the third generation homomorphic encryption scheme, which reduces the complexity of homomorphic operations by introducing approximate computation, and has been widely used in the field of privacy-preserving machine learning. However, the computational complexity of CKKS algorithm is more than 4 orders of magnitude higher than that of plaintext computation, and there is a serious mismatch between the computation path and the storage access path.Aiming at the above problems, this thesis will study the hardware acceleration technology of CKKS homomorphic encryption algorithm from the aspects of hardware architecture, scheduling mechanism and circuit implementation. Firstly, a fine-grained vectorized hardware architecture is proposed, which uses dynamic and static scheduling and streaming data processing mechanism. Compared with the latest work, the hardware utilization rate is increased by 22.8%. Secondly, a hardware and software co-design method of basis conversion algorithm is proposed. Compared with the direct implementation, the modular multiplication operation is reduced by 33% and the memory access times are reduced by 83.3% while avoiding the Montgomery domain conversion calculation. Finally, a homomorphic operator circuit design method based on fine-grained vectorized architecture is proposed. By using on- the-fly data generation, conflict-free access strategy and address mapping decomposition, the number of memory access of NTT units and storage overhead of the automorphic units are reduced by 50.0% and 93.9%, respectively.In order to verify the above design, this thesis verifies and evaluates the design performance on Xilinx VCU-128 FPGA and TSMC 28-nm LVT technology node respectively. When the parallelism degree reaches 256, the maximum synthesized frequency of FPGA reaches 300 MHz. The performance in HELR and Bootstrpping was improved by 1.59 and 1.48 times, respectively. The results of ASIC simulation show that the area of NTT and automorphic units are reduced by 66.6% and 87.0%, respectively.The overall computational density and energy efficiency are improved by 66.32 times and 146.01 times, respectively.