登录 EN

添加临时用户

分布式存储系统中最优局部修复码的构造与理论界的改进

Constructions and Improved Theorectical Bounds of Optimal Locally Repairable Codes in Distributed Storage Systems

作者:陈斌
  • 学号
    2017******
  • 学位
    博士
  • 电子邮箱
    cb1******.cn
  • 答辩日期
    2020.12.11
  • 导师
    夏树涛
  • 学科名
    计算机科学与技术
  • 页码
    104
  • 保密级别
    公开
  • 培养单位
    024 计算机系
  • 中文关键词
    分布式存储,局部修复码,线性码,循环码,有限几何
  • 英文关键词
    Distributed storage systems,locally repairable codes, linear codes, cyclic codes,finite geometry

摘要

局部修复码在分布式系统中的应用与理论研究成为了近年来信息论和计算机领域的热门研究方向,它在增加数据冗余与保持数据修复的局部性方面起到了很好的权衡作用,进而提升了分布式存储系统中数据删除的修复效率。局部修复码的理论界和最优码的构造是编码理论的核心问题,同时也为分布式存储系统设计更好参数的编码方案提供依据和保证。局部修复码关于极小距离的理论界在一定程度上代表同参数下码的纠错检错性能的理论极限,Gopalan等人从经典纠错码的角度出发给出了线性局部修复码极小距离的Singleton上界,构造达到该理论界的线性码被称为最优局部修复码。最优局部修复码的更大码长的构造与最大码长的理论界决定了码的码率。因此,有必要设计更长码长的最优码和确定最优码的最大码长,这对降低分布式存储系统的存储开销至关重要。在分布式存储系统中,除了局部性这一重要的性能指标外,存储冗余与修复带宽也是影响存储节点修复效率的两个重要指标。设计同时具备低修复带宽,低存储冗余,好的局部性的纠错码,进一步建立三者之间的数学联系仍然是一个待解决的问题。另外,在考虑分布式存储系统中热数据与数据不平衡等因素时,设计更加适用于实际场景的纠错码也很有必要。本文从经典纠错码的角度出发,研究分布式存储系统中最优局部修复码的构造和理论界的改进,主要研究内容和贡献可以总结如下:1. 借助循环码和常循环码的零点结构,我们首先得到了码长$n\mid (q+1)$的全部参数的最优$(r,\delta)$局部修复码的构造,另外我们得到的码长为$n\mid (q-1)$的最优循环$(r,\delta)$局部修复码直接推广了Tamo等人的构造,这进一步丰富了最优码的参数范围。2. 针对极小距离为5和6的最优局部修复码,我们基于校验矩阵方法给出了改进的最大码长理论界,该理论界在某些特殊参数下是紧的。对于局部性参数$r=2$和极小距离$d=6$的情形,我们给出了一类最优且码长为$n=3(q+1)$的太阳花构造,该构造在目前同参数下的码长是最长的,且在$q=3, 4, 5$时达到最大码长。另外,我们给出了该参数下最优码存在的完全几何刻画。基于该几何刻画,我们得到了目前最好的最大码长理论上界。3. 针对分布式存储系统中的热数据与数据不平衡问题,提出了一个不等局部性的概念并得到相应的理论界和构造,建立了一个分 布式存储系统中最优标量局部修复码的修复带宽理论框架,并给出了一种低带宽的修复方案。

The application and theoretical research of locally repairable codes (LRCs) in distributed systems have become a popular research direction in the field of information theory society and computer society in recent years. LRCs have played a good trade-off role in increasing data redundancy and maintaining repairing locality, thereby improving the repairing efficiency of data erasures in distributed storage systems. The theoretical bounds and optimal constructions of LRCs are the core research problem in coding theory, which provides the basis and guarantee for designing coding schemes with better parameters in distributed storage systems. The theoretical bound on the minimum distances of LRCs contributes to the theoretical limit of error correction and error detection performance under the same parameters. Gopalan et al. proposed a Singleton-like upper boundfrom the perspective of classic error-correcting code, and a linear code achieving this bound is called an optimal LRCs. The optimal construction of locally repairable codes with larger code length and theoretical bound on the maximum code length both determine the code rate. Therefore, it is necessary to design an optimal code with a larger code length and determine the maximum code length of the optimal codes, which is crucial to further reduce the storage overhead in a distributed storage system. In a distributed storage system, in addition to the important performance metric likes locality, storage redundancy, and repair bandwidth are other two important metrics that largely affect the repair efficiency of storage nodes. The design of coding schemes simultaneously maintaining low repair bandwidth, low memory redundancy, small locality, and further establishing the mathematical relationship among these three metrics is still an unsolved problem. In addition, when faced with hot data, imbalanced data, and other impact factors in distributed storage systems, it is necessary to design error correction codes that are more suitable for actual scenarios.From the perspective of classic error-correcting codes, this paper studies the construction and improvements of theoretical bound of optimal locally repairable codes in distributed storage systems. The main contributions can be summarized as follows:1. Based on the zeros' structure of cyclic codes and constacyclic codes, we first obtain the constructions of optimal $(r,\delta)$ locally repairable codes with length $n\mid q+1$ for all parameters. In addition, the constructions of optimal $(r,\delta)$ locally repairable codes with length $n\mid q-1$ directly generalize the results of Tamo et al., which further enriches the parameter range of the optimal codes.2. For the optimal locally repairable codes with minimum distances 5 and 6, we propose an improved theoretical bound on the maximum code length via the parity-check matrix approach, which is tight under some special parameters. For locality parameter $r=2$ and minimum distance $d=6$, we propose a class of optimal sunflower construction with code length $n=3(q+1)$. The code length is the longest until now and achieves the maximum code length when $q=3,4,5$. Moreover, we have established a complete geometric characterization of the existence of optimal codes under such parameters. Based on this geometric characterization, we have obtained the best upper bound for the maximum code length under the same parameters.3.Considering the problem of hot data and imbalanced data in distributed storage systems, we proposed the concept of unequal multiple $(r_i,\delta_i)$ localities for the first time, then establish a Singleton-like theoretical upper bound on the minimum distance and obtain corresponding optimal construction. This theoretical upper bound directly generalizes the results of Zeh et al., and also provides a new coding scheme that enhances the repairing and erasure-correcting capability for different nodes in an actual distributed storage system. In order to design an optimal scalar locally repairable code that has both favorable locality and low repair bandwidth, we have established the theoretical lower bound on the repair bandwidth of the optimal scalar locally repairable code based on the repair framework of the Reed-Solomon code proposed by Guruswami et al. Two types of low-bandwidth repair schemes are obtained for two well-known optimal $(r,\delta)$ locally repairable code, which further provides a reference for jointly optimizing various performance metrics in distributed storage systems.