登录 EN

添加临时用户

后量子密码算法CRYSTALS-KYBER的高速实现研究

Research on the high-speed implementation of post quantum cryptography algorithm CRYSTALS-KYBER

作者:马列君
  • 学号
    2018******
  • 学位
    硕士
  • 电子邮箱
    231******com
  • 答辩日期
    2021.05.19
  • 导师
    吴行军
  • 学科名
    集成电路工程
  • 页码
    76
  • 保密级别
    公开
  • 培养单位
    026 集成电路学院
  • 中文关键词
    后量子密码,FPGA,CRYSTALS-KYBER密码算法,快速数论变换
  • 英文关键词
    PQC,FPGA,CRYSTALS-KYBER,NTT

摘要

信息安全问题一直是当代信息社会研究的热点问题,它涉及到通信传输、网络安全、加密货币等方方面面;现有的各类信息加密技术是基于一系列数学困难问题构建的,这些数学困难问题被认为在当前人们所掌握的计算能力下是NP难问题(Non-deterministic Polynomial Hard Problem)。然而随着量子技术的发展,量子计算机的出现使得人们所掌握的计算能力得到极大的增强,给信息安全问题带来了全新的挑战,许多原有的基于经典NP难问题的密码系统在量子计算机强大的算力下将不再安全,如基于大素数分解问题的RSA密码系统和基于椭圆曲线上离散对数问题的ECC密码系统。后量子密码算法(PQC, Post-Quantum Cryptography) 的研究越来越受到人们广泛的关注。PQC算法所基于的数学原理被证明为即使在量子算力下也是NP难问题。然而各种PQC算法由于运算过程复杂,运算数据量较大,缺乏统一的标准协议等问题,目前主要停留在算法层面的研究工作,距离实现应用还有一段距离。本文针对PQC算法中的基于格的后量子密码算法CRYSTALS-KYBER算法进行硬件实现研究,通过对其中关键运算模块(多项式乘法器)的专用化电路设计,提出了一种针对于CRYSTALS-KYBER算法的硬件加速方案。本文根据CRYSTALS-KYBER算法的运算特点,基于Xilinx Zynq 7010 FPGA平台设计了多项式乘法器加速电路,通过并行快速数论变换(NTT, Number Theoretic Transform)算法,减少了商环上的多项式乘法这一操作的运算量,同时利用CRYSTALS-KYBER算法中多项式互不相关的特点,通过交织运行的方法消除了并行NTT算法中的等待时间,提高了电路的运算效率。此外,本文中的设计尽可能多的将CRYSTALS-KYBER算法中的操作移植到多项式乘法器加速电路上运行,既能通过硬件加速减少这些操作的运行时间又能复用多项式乘法器中的硬件资源,提高电路的利用率;同时添加了拓展存储电路进行数据缓存,减少了软硬件之间的数据通信时间,与现有实现方案相比,对NTT变换的加速效率在最好情况下能提高50%左右。此外,本文中还为多项式乘法器加速电路设计了整个算法实现层面的验证电路,即CRYSTALS-KYBER算法的实现电路,通过运行CRYSTALS-KYBER-CCAKEM密钥封装方案,比较进行一次封装与解封装算法的绝对运行时间;实验结果表明,本文中的多项式乘法器硬件加速电路对算法加速效果明显,与现有实现方案相比,对CRYSTALS-KYBER-CCAKEM密码系统的加速效率能平均提高27%左右。

Information security has always been a hot issue in contemporary information society, which involves communication transmission, network security, cryptocurrency and other aspects. The existing information encryption technologies are based on a series of mathematical difficult problems, which are considered to be NP-hard problems under the current computing power. However, with the development of quantum cryptography many cryptosystems based on classical NP-hard problems will not be secure any more. Such as RSA cryptosystem based on large prime decomposition and ECC cryptosystem based on elliptic curve discrete logarithm. Post quantum cryptography (PQC) has attracted more and more attention. The mathematical foundation of PQC algorithm is proved to be NP-hard even under quantum computing power. However, due to the complex operation process, large amount of data and lack of unified standard protocol, all kinds of PQC algorithms mainly stay in the research of algorithm level, which is still a long way from the implementation and application. In this paper, CRYSTALS-KYBER algorithm which is one of the lattice based post quantum cryptography algorithms in PQC algorithm is selected as the research object. A hardware acceleration scheme for CRYSTALS-KYBER algorithm is proposed through the specialized circuit design of the key computational module (polynomial multiplier).According to the operation characteristics of CRYSTALS-KYBER algorithm, a polynomial multiplier acceleration circuit is designed based on Xilinx Zynq 7010 FPGA platform. At the same time, with the characteristic of unrelated polynomials in crystals-kyber algorithm, the waiting time in parallel NTT algorithm is eliminated by interleaving operation and the efficiency of the circuit is improved. In addition, the design in this paper transplants as much as possible operations of CRYSTALS-KYBER algorithm to the polynomial multiplier acceleration circuit, which can not only speed up the operation time through hardware but also reuse the hardware resources of polynomial multiplier to improve the utilization of the circuit. We also add the extended storage circuit for data cache, reducing the data communication time between software and hardware. Compared with the existing implementation scheme, the acceleration efficiency of NTT transformation in this paper can be improved by about 50% in the best case.In addition, this paper also designs the implementation circuit of CRYSTALS-KYBER algorithm as the upper verification circuit for the polynomial multiplier acceleration circuit. By running CRYSTALS-KYBER-CCAKEM key encapsulation scheme, we compare the running time of encapsulation and decapsulation algorithm. The experimental results show that the polynomial multiplier hardware acceleration circuit in this paper has obvious acceleration effect on the algorithm. Compared with the existing implementation scheme, the acceleration efficiency of CRYSTALS-KYBER-CCAKEM in this paper can be improved by 27% on average.