在群智感知系统中,数据包含的敏感信息使得数据发布的过程面临着潜在的隐私泄露风险。本文在本地化差分隐私的框架下,以设计群智感知系统中面向数据发布的隐私保护机制为主要研究目标。根据群智感知数据发布中的数据感知、编码以及分析流程,本文分别从数据的可获取性、数据类型、个性化隐私保护需求以及数据分析四个层面展开研究,明确了现有隐私保护机制在缺失数据统计信息发布、键值对数据关联信息发布、个性化隐私保护以及模型发布上面临的不足。具体来说,本文的主要贡献如下:针对缺失数据带来的隐私保护挑战与统计偏差问题,本文提出了基于双向采样机制的缺失数据统计信息发布方法。通过引入正采样与负采样两种不同概率分布的扰动机制,双向采样机制可在实现本地化差分隐私的基础上对缺失数据的缺失率与均值进行无偏估计,避免数据缺失导致的数据偏差。 针对键值对数据当中键与值数据类型的差异性带来的关联分析挑战,本文提出了键值对数据关联信息发布方法。本文首次定义了键值对数据上的频率关联与均值关联估计问题。同时,所设计的索引独热编码机制可在为多键值数据提供本地化差分隐私的基础上,进行键值对数据的关联信息发布。 针对用户个性化隐私保护需求带来的数据效用降低问题,本文提出了个性化本地差分隐私下的数据效用优化方法。本文提出了“离散化--扩展--扰动”的阶梯机制以实现无偏数据扰动。理论分析表明,高隐私预算阶梯机制与广义随机响应的线性组合等价于低隐私预算阶梯机制。基于此特性的设计的数据循环策略可用于个性化隐私保护场景下的数据效用优化。 针对当前隐私保护的数据分析缺乏严格隐私保护定义的现状,本文提出了本地化差分隐私下距离保持的编码机制与对应的非交互式聚类方法。本文设计了距离保持的编码机制,并通过在匿名空间中对数据采用随机响应的方式提供可证明的隐私保护。基于编码机制的距离保持特性与隐私保护特性,本文设计了非交互式聚类模型发布方法用于扩展现有的数据分析场景。综上,本文针对群智感知系统中的数据发布问题,提出了多种基于本地化差分隐私的扰动机制,为隐私保护的数据发布提供了关键的技术与理论支撑。
Sensitive information contained in the crowdsourcing data leads to potential privacy leakage under the data publishing procedure in crowdsensing systems. The goal of this dissertation is to design and implement privacy-preserving mechanisms for data publishing in the crowdsensing systems with local differential privacy. This dissertation investigates privacy-preserving data publishing mechanisms from the aspects of the collection of data, the diversity of data types, the personalized privacy concerns and data analysis demands. As existing privacy-preserving mechanisms cannot effectively deal with missing data, correlations in key-value data, personalized privacy-preserving concerns and the limitations in data analysis, this dissertation proposes several efficient privacy-preserving mechanisms to address these issues. Specifically, the main contributions of this dissertation are as follows.The BiSample, a bi-directional perturbation mechanism for data perturbation, is proposed to avoid privacy leakage and statistical biases caused by the missing data. By introducing the positive sampling and negative sampling techniques, the BiSample mechanism can be used for missing data perturbation under local differential privacy. Theoretical analysis shows that BiSample can avoid statistical bias.The perturbation mechanism for correlation analysis is proposed for key-value data, where keys are categorical and values are numerical. For the first time, this dissertation defines the frequency correlation and mean correlation analysis for key-value data. The proposed Indexing One-Hot mechanism can be used for correlation publishing under the protection of local differential privacy.A data-utility optimization framework is proposed under personalized local differential privacy. By using a discretization--expansion--perturbation scheme, the proposed Stepwise mechanism can achieve unbiased statistical data publishing with local differential privacy guarantees. Theoretically, it shows that a highly-private Stepwise mechanism can be achieved when using Generalized Randomized Response under a lowly-private Stepwise mechanism. This property is then used to design data recycling mechanism for data-utility optimization with personalized local differential privacy.The distance-aware encoding mechanisms and corresponding clustering algorithm are proposed to address the limitations of current data analyzing techniques. By using Randomized Response in the anonymized space, the proposed distance-aware encoding mechanism can achieve strict privacy-preserving guarantees. Based on the distance-aware property, a non-interactive clustering algorithm is introduced for data-mining in the crowdsensing system.In summary, this dissertation proposes a series of perturbation mechanisms with local differential privacy for data publishing in crowdsensing systems. The techniques and mechanisms proposed in this dissertation can provide theoretical support when applying local differential privacy in the crowdsensing system.