预训练算法模型作为AI主体,在社会应用中被发现习得了人类的社会偏见。算法偏见来源于数据集的不平衡分布和算法设计的不足,会导致算法模型产生歧视性决策并固化刻板印象。语言模型因基于广泛语料训练而容易习得文本中的偏见观点,近年来语境化语言模型BERT的问世引发了对现有偏见研究的挑战。本研究以中文BERT模型为研究对象,以职业性别偏见为课题,探索了BERT偏见如何分析、如何量化、如何减轻等问题,完成了以下工作:第一,本研究基于价值敏感设计提出一种算法偏见的研究框架。价值敏感设计是考虑人类价值的技术设计方法,本研究结合方法提出算法偏见的三元分析框架,包括:1)概念分析,对偏见概念进行定义,获悉利益相关方的立场,明确治理目标;2)经验分析,明确偏见的衡量标准,通过社会调查量化利益相关方的伦理诉求;3)技术分析,打开算法黑箱对偏见进行量化和约束。第二,本研究提出了一种基于掩码语言模型的偏见测量方法“无掩码似然预测法(UPL)”,发现BERT存在对男性的偏好。UPL输入无掩码的测试句并预测所有单词的概率,量化不同职业词和性别词之间的语义相关性。研究发现BERT偏见值与训练集中的男性词频呈正相关关系,其中技术岗位最偏向男性,偏见值是行政岗位的3.2倍。研究发现UPL能避免受BERT微调后的预测概率变动的影响,与现有方法相比稳定性更强。第三,本研究提出了一种基于词向量子空间的正则化纠偏法,能降低BERT 90%以上的偏见水平。通过对职业和性别词的表征向量差进行矩阵转换,可以得到定义偏见语义的稠密子空间,将其与词嵌入向量之间的距离作为正则项加入BERT的目标函数可以约束算法偏见。经实验检验,正则化纠偏法可以降低90%以上的UPL偏见值,是baseline偏见水平的10.9-88.2%,能在不同数据集环境下约束BERT偏见,且不会影响模型的正常语义。第四,本研究基于简历检索场景进行了实践检验。研究基于LinkedIn的检索框架DeText对BERT进行训练,对比纠偏前后的简历排序结果。研究发现BERT原模型对女性简历有偏见,男性简历平均排名领先9.5名。而正则化纠偏能缩小42.8-91.3%的排名差距,整体平均排名差距缩小至3.5名。
The pre-training algorithm model is found to acquire human social biases in social applications. Algorithm bias comes from the unfair distribution of datasets and the inadequate algorithm design, which can lead to discriminatory decisions and damage vulnerable groups. Pre-trained Language models are easy to acquire bias in texts based on extensive corpus training. Recent years the emergence of contextualized language model BERT has challenged existing bias studies. Taking Chinese BERT model as the research object and occupational gender bias as the subject, this study explored how to analyze, quantify and mitigate BERT bias, and completed the following works:(1) This study proposes an algorithmic bias research framework based on value sensitive design (VSD) to solve the problem that bias is difficult to define. VSD is a technical design method based on the theory of value. In combination with VSD, this study proposed a three-dimension analysis framework for algorithm bias, including: conceptual analysis, which define the concept of bias by learning the position of stakeholders to clarify the governance objectives; Empirical analysis, which clear bias measurement standards by quantitative or qualitative research methods to quantify stakeholders' ethical appeals; Technical analysis, which open the “black box” of algorithm to quantify and mitigate algorithm bias encoded in BERT.(2) This study proposed a bias measurement method based on masked language model, the Unmasked Predicted Likelihood (UPL), and found that Chinese BERT has a preference for male occupation. UPL quantifies the semantic correlation between different occupational words and gender words by inputting full test sentences without masked word and predicting the probability of all words. UPL can retain the whole context and avoid the influence of the variation of prediction probability after BERT fine-tuning, so have a better performance in measurement accuracy. This study found that the professional words in Chinese BERT model were more related to men. Among them, the technical positions were the most male-biased, which was 3.2 times higher than the administrative positions, and the UPL bias was positively correlated with the frequency of male words in the training set.(3) This study proposes a regularization debias method based on the word direction quantum space, which can significantly reduce the bias level of BERT. Through matrix transformation of the difference between the representation vector of occupational and gender words, the super-dense subspace encoding occupational gender semantics can be obtained. Add the Euclidean distance between subspace and word embedding vector of BERT into the objective function of training as the loss term, the bias can be mitigated. Experimental results show that this method can reduce the UPL bias of BERT by more than 90%, which is 10.9-88.2% of the baseline bias level, and does not affect the normal semantics presentation of BERT.(4) This research has passed the practice test of a resume retrieval system. This study trained BERT based on DeText, the retrieval framework of LinkedIn, and compared the ranking results of male and female resumes before and after debiasing. This work found that the BERT model was biased against female resumes, with male resumes ranking 9.5 places ahead on average. Regularization correction can reduce the ranking gap by 42.8% to 91.3%, and the overall average ranking gap is reduced to 3.5.