公钥密码算法是密码学领域内的一项伟大发明。在如今的信息化时代,公钥算法在建立安全的多方信息交互通道,保证数据和信息的机密性、不可否认性和真实性上发挥着不可替代的作用。随着互联网技术与云计算技术的蓬勃发展,数据存储和电子通信对信息安全技术提出了更高的要求。同时,量子计算技术的出现与发展,配合量子算法正以其强大的非传统算力威胁着传统公钥密码算法的安全性和可靠性。为了维持长期的通信与数据存储安全,NIST于2016年开始了在全球范围内的后量子密码算法标准的征集活动,目前已经进行到了第三轮的评估。共有4种公钥加解密或密钥封装方案以“候选者”身份进入到了第三轮的评估。NTRU算法是其中一种,隶属于格密码领域。在算法的构成上,NTRU以多项式环上的多项式乘法为主,辅以最新的哈希标准SHA3-256构成了密钥封装方案(NTRU-KEM)。本论文以该算法目前提交的最新版本为对象,提出了一种最高安全等级参数的NTRU加解密和密钥封装的完整全硬件实现方案。电路设计上,以NTRU的高速实现为目标,提出了一种紧凑的NTRU加解密结构,逐模块设计并整合为完整的密钥封装结构。在环上多项式乘法的实现上,采用线性移位寄存器结构来完成多项式的模约运算,利用三元多项式系数的特殊性加速三元多项式参与的乘法运算。普通多项式之间的乘法运算实现方面,我们诉诸于硬件的并行特点和灵活性,设计了其专有的运算模块。此外,还结合算法中三种不同多项式乘法的特点,利用数据转换模块将它们的实现集成到了同一种线性移位寄存器结构下,方便资源的复用,并保持了不同模块设计之间的紧凑性。在SHA3-256算法的设计上,也尽量照顾了面积优化。设计完成后,使用多组参考向量验证了设计的功能正确性,并在FPGA平台和ASIC平台对该NTRU的硬件设计分别进行了实现和综合。仿真测试表明,本设计在Xilinx Zynq-7000系列的FPGA上可以实现最高66.7MHz的工作频率,等效密钥封装/解封装执行时间分别为12.315μs和112.320μs。综合结果显示在UMC 55nm工艺下,该设计的工作频率可以达到180MHz,等效密钥封装/解封装的执行时间分别为4.516μs和41.184μs。
The public key cryptosystem is a great invention in the field of cryptography. In today's information age, public key algorithms play an irreplaceable role in establishing secure multi-parties information exchange channels and ensuring the confidentiality, undeniability and authenticity of data and information. With the vigorous development of Internet technology and cloud computing technology, data storage and electronic communication have put forward higher requirements for information security technology. At the same time, the emergence and development of quantum computing technology, coupled with quantum algorithms, is threatening the security and reliability of traditional public key algorithms with their powerful non-traditional computing power.In order to maintain long-term communication and data storage security, NIST started a global solicitation of post-quantum cryptography algorithm standards in 2016 and has now reached the third round of evaluation. A total of 4 public key encryption and decryption or key encapsulation mechanism schemes have entered the third round of evaluation as "Finalists". NTRU algorithm is one of them, belonging to the field of lattice-based cryptosystem. In terms of algorithm composition, NTRU mainly uses polynomial multiplication on a polynomial ring, supplemented by the latest hash standard SHA3-256 to form a key encapsulation mechanism(NTRU-KEM). This paper takes the latest version of the algorithm as the object and proposes a complete hardware implementation of NTRU encryption/decryption and KEM with the highest security level parameters.In circuit design, with the goal of high-speed implementation of NTRU, a compact NTRU encryption and decryption structure is proposed, which is designed and integrated into a complete key packaging structure module by module. In the implementation of polynomial multiplication on the ring, the linear shift register structure is used to complete the modulo reduction operation of the polynomial, and the particularity of the ternary polynomial coefficient is used to accelerate the multiplication operation of the ternary polynomial. For the implementation of multiplication operations between regular polynomials, we resorted to the parallel features and flexibility of hardware and designed its proprietary arithmetic modules. In addition, it also combines the characteristics of three different polynomial multiplications in the algorithm and uses the data conversion module to integrate their implementations into the same linear shift register structure, which facilitates the reuse of resources and maintains the compactness between different module designs. In the design of the SHA3-256 algorithm, the area optimization is also taken care of as much as possible. After the design was completed, multiple sets of reference vectors were used to verify the functional correctness of the design, and the hardware design of the NTRU was implemented and synthesized on the FPGA platform and the ASIC platform.Simulation test shows that this design can achieve a maximum operating frequency of 66.7MHz on the Xilinx Zynq-7000 series FPGA, and the equivalent key encapsulation/decapsulation execution time is 12.315μs and 112.320μs, respectively. The synthesis result shows that under the UMC 55nm process, the operating frequency of this design can reach 180MHz, and the execution time of equivalent key encapsulation/decapsulation is 4.516μs and 41.184μs, respectively.