登录 EN

添加临时用户

大语言模型的中文公平性评估研究

Research on Chinese Fairness Evaluation of Large Language Model

作者:朱述承
  • 学号
    2021******
  • 学位
    博士
  • 电子邮箱
    zhu******com
  • 答辩日期
    2025.05.20
  • 导师
    刘颖
  • 学科名
    中国语言文学
  • 页码
    162
  • 保密级别
    公开
  • 培养单位
    069 人文学院
  • 中文关键词
    大语言模型;公平性;评估;数据集
  • 英文关键词
    large language model; fairness; evaluation; dataset

摘要

大语言模型通过海量数据训练,能够生成高质量、连贯的文本,但由于训练数据中可能包含偏见、歧视等不公平的内容,模型在生成或识别文本时可能无意中放大或传播这些偏见。这种偏见不仅可能影响用户体验,还可能对社会公平和包容性产生深远影响。本研究围绕大语言模型的中文公平性评估展开,通过构建数据集、设计评估方法以及全面的评估分析,系统探讨了大语言模型的公平性,主要的研究成果包括:(1)构建中文群体名词数据集。面向大语言模型的中文公平性评估的研究对象,本文构建了一个包含2483个中文群体名词的数据集,涵盖10个大类及若干小类,并对群体名词的冒犯性进行了标注与分析,揭示了冒犯性与群体特征之间的关系,同时通过访谈探讨了冒犯性群体名词的身份建构功能。(2)评估大语言模型生成文本的公平性。以国籍偏见为例,本文采用自动化评估指标、专家标注和模型自评相结合的方法,评估了以ChatGPT为代表的大语言模型生成文本的公平性。发现尽管模型生成文本总体较为公正,但如果以对比的方式进行评估,仍可以发现存在隐蔽的偏见,反映了与客观世界相似的价值取向。(3)评估并优化大语言模型提示语的公平性。本文提出了一种利用大语言模型自身作为优化器的框架,评估并优化了大语言模型的提示语,并建立起提示语模板库,发现大语言模型可以作为优化器不断优化提示语的公平性,优化过程受到温度设定等因素的影响,思维链等提示语风格有助于生成更无偏见的文本,而草率风格则可能加剧模型生成文本的偏见。(4)评估大语言模型的公平性识别能力。本文构建了一个包含2万句的多维度句子级公平性数据集,评估了多个大语言模型的公平性识别能力,发现在公平性识别任务上整体表现不及专门的微调预训练模型,且受文体类型影响显著。(5)质性评估大语言模型的公平性。以性别身份建构为例,本文基于元语用意识框架对模型DeepSeek和ChatGPT进行了定性评估,发现模型在复杂性别身份建构任务中表现出一定的元语用意识,但在理解隐蔽的性别偏见方面仍有不足。本研究为大语言模型的中文公平性评估提供了系统的理论框架和实践方法,揭示了模型在公平性方面的潜力与局限,为未来研究提供了重要参考。

Large language models, trained on massive datasets, demonstrate remarkable capabilities in generating high-quality and coherent texts. However, due to potential biases, discrimination, or unfair content embedded in training data, these models may inadvertently amplify or propagate such biases during text generation or recognition. Such biases could not only impair user experience but also exert profound impacts on social equity and inclusivity. This study focuses on Chinese fairness evaluation of large language models, systematically investigating their fairness through dataset construction, assessment methodology design, and comprehensive analytical evaluation. The main research contributions include:(1) Construction of a Chinese group noun dataset. Targeting the research objectives of Chinese fairness evaluation for large language models, we developed a dataset containing 2,483 Chinese group nouns across 10 major categories and multiple subcategories. The dataset features offensiveness labeling and analysis, revealing correlations between offensiveness and group characteristics. Interviews were also conducted to explore the identity-constructing functions of offensive group nouns.(2) Evaluating fairness in large language models-generated texts. Using nationality bias as a case study, we systematically evaluated the fairness of text generated in representative large language models, with a focused analysis on ChatGPT, through a tripartite methodology integrating automated metrics, expert annotations, and model self-evaluation. Results indicate that while generated texts generally exhibit neutrality, subtle biases emerge when evaluated comparatively, reflecting value orientations akin to real-world societal biases.(3) Assessing and optimizing prompt fairness in large language models. We proposed a large language models-as-optimizer framework to evaluate and enhance the fairness of prompts in large language models, establishing a prompt template dataset. Findings demonstrate that large language models can iteratively optimize prompt fairness, with optimization efficacy influenced by temperature and other settings. Chain-of-thought and other prompting styles facilitate less biased outputs, whereas sloppy and other prompting styles may exacerbate model biases.(4) Evaluating large language models’ fairness recognition capabilities. A multidimensional, sentence-level fairness dataset containing 20,000 sentences was constructed to evaluate large language models’ fairness recognition abilities. Results show that large language models underperform specialized fine-tuned pretrained models in fairness detection tasks, with performance significantly affected by text genres.(5) Qualitative assessment of large language model fairness. Focusing on gender identity construction, we conducted a metapragmatic awareness-based qualitative evaluation of DeepSeek and ChatGPT. While both models exhibit partial metapragmatic awareness in complex gender identity tasks, limitations persist in recognizing implicit gender biases.This study provides a systematic theoretical framework and practical methodologies for Chinese fairness evaluation of large language models, highlighting their potential and limitations in promoting fairness. The findings offer critical insights for future research in this domain.