侧信道攻击技术的飞速发展给当今的密码设备带来了越来越严重的安全威胁。 掩码技术由于其有着严格的数学可证明安全性,被广泛应用于密码设备与系统的 抗侧信道攻击实现中。但随着后量子密码与轻量级密码等新兴密码算法的标准化演进,如何实现兼顾侧信道安全与实现效率的掩码防护技术成为评估候选算法性能、影响最终标准的重要指标。针对此问题,本课题分别针对面向多应用场景的算术到布尔掩码转换算法、轻量级密码算法的高性能硬件掩码防护展开了深入研究。首先针对算术到布尔掩码转换算法,本文提出了 3 种不同的基于预计算表的掩码转换算法以满足不同应用场景的需求。本文提出的内存优化的算术到布尔掩码转换算法,名为混合 Goubin 算法,以最小的内存占用提供一阶差分能量分析安全,同时不存在预计算表尺寸不固定的问题。当分段位宽为 4 时,本算法的内存开销对于 8 比特、16 比特、32 比特和 64 比特的转换分别降低 9.1%、23.1%、41.2% 和60%。本文提出的面向速度优化的算术到布尔掩码转换算法,称为全表算法,突破了最快转换算法的性能上限。性能比最优的 Debraize 算法快 6.5%。此外,还提出了子表算法,以在速度和内存占用之间进行折中。理论安全性分析和物理泄漏检测都证明了本文提出的 3 种算术到布尔掩码转换算法的一阶差分能量分析安全性。其次针对轻量级密码标准化进程第三轮的候选算法 Xoodyak,本文专门分析并实现了该算法的面向域的掩码实现。首先对 Xoodyak 算法面向域的掩码方案进行了安全评估。此外,本文分析了现有随机性约减技术的适用性并指出了它们不适用于 Xoodyak 的一阶面向域的掩码方案,随后提出了一种新的随机性削减技术使 Xoodyak 算法的掩码实现所需要的随机数降低 3 倍。最后,我们的防护设计在FPGA 上实现,并在 ASIC 上进行了评估。FPGA 实现结果显示,本文的未防护设计,比文献中的其他实现降低 40.6%~43.7% 的资源开销,同时性能提升 47.6%~62.4%。对于防护方案实现,比文献中实现降低 18.7%~39.6% 的资源开销,同时性能提升8.7%~9.3%。ASIC 的评估结果显示,本文的未防护设计比文献中的其他实现降低5 倍的资源开销,同时性能提升 19.7%。防护实现甚至比文献中的未防护的实现降低 15.9%~25.2% 的资源开销,而性能几乎保持不变。
The rapid development of side-channel attack technology has brought more and more serious security threats to today’s cryptographic devices. Masking technology is widely used in cryptographic devices and systems resistant to side-channel attacks since its strict mathematically provable security. However, with the standardization evolution of emerging cryptography algorithms such as post-quantum cryptography and lightweight cryptography, how to implement masking technology that takes into account side-channel security and implementation efficiency has become an important metric to evaluate the performance of candidate algorithms and influences the final standard. To solve this problem, the arithmetic to Boolean mask conversion algorithm of the post-quantum cryptography algorithm and the high-performance hardware masking implementation of the lightweight cryptography algorithm is studied in depth. Firstly, for arithmetic to Boolean masking conversion algorithms, this paper proposes three different pre-computed table-based masking conversion algorithms to meet the needs of different application scenarios. The memory-optimized arithmetic to Boolean masking conversion algorithm proposed in this paper, called the Mixed Goubin algorithm, provides first-order differential power analysis security with minimal memory usage while avoiding the problem of unfixed pre-computing table sizes. When the segment bit width is 4, the memory overhead of this algorithm for 8-bit, 16-bit, 32-bit, and 64-bit conversions is reduced by 9.1 %, 23.1 %, 41.2 %, and 60 %, respectively. The arithmetic to Boolean masking conversion algorithm for speed optimization proposed in this paper, called the full table algorithm, breaks the performance limit of the fastest conversion algorithm. Performance is 6.5 %faster than the optimal Debrize algorithm. In addition, a sub-table algorithm is proposed to compromise between speed and memory usage. Both theoretical security analysis and physical leakage detection prove the first-order differential power analysis security of the three arithmetic to Boolean masking algorithms proposed in this paper.Secondly, this paper specifically analyzes and implements a domain-oriented masking implementation of the candidate algorithm Xoodyak for the third round of lightweight cryptography standardization process. This paper first evaluates the security of the domain-oriented masking scheme of the Xoodyak algorithm. In addition, this paper analyzes the applicability of existing randomness reduction techniques and points out that they are not suitable for Xoodyak’s first-order domain-oriented masking scheme. Subsequently, a new randomness reduction technique is proposed to reduce the random number required for the masking implementation of the Xoodyak algorithm by three times. Finally, our protection design was implemented on FPGA and evaluated on ASIC. FPGA implementation results show that the unprotected design in this paper reduces resource overhead by 40.6%~43.7% compared to other implementations in the literature while improving performance by 47.6%~62.4%. For the implementation of the protection scheme, the resource overhead is reduced by 18.7%~39.6% compared to the implementation in the literature, while the performance is improved by 8.7%~9.3%. The evaluation results of ASIC show that the unprotected design in this paper reduces resource overhead by 5 times compared to other implementations in the literature while improving performance by 19.7 %. Protected implementations even reduce resource overhead by 15.9%~25.2% compared to unprotected implementations in the literature, while performance remains almost unchanged.