登录 EN

添加临时用户

多方学习的算法效用与公平性研究

Research on Utility and Fairness in Machine Learning with Multi-Party

作者:崔森
  • 学号
    2019******
  • 学位
    博士
  • 电子邮箱
    cui******.cn
  • 答辩日期
    2024.05.20
  • 导师
    张长水
  • 学科名
    控制科学与工程
  • 页码
    136
  • 保密级别
    公开
  • 培养单位
    025 自动化系
  • 中文关键词
    算法公平性;联邦学习;排序算法;多模态学习
  • 英文关键词
    Algorithmic Fairness; Federated Learning; Ranking; Multi-modal Learning

摘要

依赖大规模高质量数据,机器学习近几年得到了长足的发展。由于数据往往 来自不同人群和不同机构,一个备受关注的研究问题是,算法在现实应用时是否 会对群体带来系统性的歧视。本文从三个方面研究了多方学习的算法效用与公平 性问题,分别是现实场景需要怎样的算法公平、统计异质下如何平衡算法效用与 公平、数据异构下如何平衡算法效用与公平。关于第二方面,本文进一步从单机 构多群体公平性和多机构多群体公平性两个方面的进行了探索。本文的主要贡献如下:1. 提出了多方学习中基于个人意愿的公平性算法。该算法致力于让每个机构通过合作获得的效用?升与它们对模型学习所做出的贡献成比例。基于此,本文提出“合作均衡理论”,并证明了合作均衡状态的存在性。在合作均衡状 态下,所有参与方都达到了一个公平的合作平衡点,在这个点上,没有任何一方可以通过改变自己的策略来单独提升自己的效用,确保了在多方学习中 每个参与者都能在公平的基础上获得最大的效用提升。2. 提出了一种基于动态规划的单机构公平学习算法。该方法用后处理的方式寻 找排序学习中算法公平与准确率的一个更好的权衡,可被广泛用于矫正各种 不完美的排序算法模型。相较于以往工作优化矫正函数的做法,该方法?出 直接寻找不同群体最优序列的思路。与基线算法相比,该算法有更大的搜索 空间,可以实现更高的算法公平,并可以维持更高的算法准确率。3. 提出了一种基于多目标学习的多机构公平学习算法。现实合作场景中各个机 构更关注模型在各自机构是否公平。本文将多方学习的效用公平和群体公平 建模为一个带约束的多目标优化问题。本文证明了最优模型的存在性,并设 计约束优化方法使得模型能够实现各个机构的效用公平和群体公平。与基线 算法相比,该算法在各个机构有着更一致的效用表现和有更高的公平性。4. 提出了一种基于帕累托优化的多模态公平学习算法。现实中的数据往往是不 完整的异构多模态数据。对于该不完整的多模态数据,本文致力于通过联合 分析所有可见的异构模态数据复原出缺失模态。为了解决多模态学习中的统 计异质性和模态异质性问题,本文寻找到可见模态和缺失模态的最优转换, 算法实现了更公平一致的多机构缺失模态补全。

Because of large-scale, high-quality data, machine learning has seen significant development in recent years. Given that data often originates from diverse populations and institutions, a pressing research question is whether algorithms might systematically discriminate against different groups or organizations in real-world applications. This paper investigates the performance and fairness of multi-party learning algorithms from three aspects: the type of fairness required in real scenarios, balancing algorithm performance and fairness under statistical heterogeneity, and achieving effective fair learning in the presence of data heterogeneity. Specifically, the exploration is furthered into fair learning for single institutions with multiple groups and multi-institution, multi-group scenarios.The main contributions of this paper are as follows:? We introduce a new fairness concept in multi-party learning based on real-world game theory algorithms, where the performance improvement each institution gains through collaboration is proportional to their contribution to model learning. Based on this, we propose the Cooperative Equilibrium Theory and prove the existence of a cooperative equilibrium state. In this state, all parties reach a fair balance point where no one can improve their performance by changing their strategy alone, ensuring that each participant in multi-party learning can achieve the maximum performance improvement on a fair basis.? We propose a single-institution fair learning algorithm based on dynamic programming. This method seeks a better balance between algorithm fairness and performance in ranking learning as a post-processing step, and can be widely used to correct various imperfect ranking algorithm models. Unlike previous efforts to optimize correction functions, this method innovatively seeks the optimal sequence for different groups directly. Compared to baseline algorithms, it has a larger search space, achieves lower algorithmic discrimination, and maintains higher performance.? We introduce a multi-institution fair learning algorithm based on multi-objective learning. In real-world collaborative scenarios, institutions are more concerned with whether the algorithm’s performance across different groups within their organization is fair. Therefore, we model the performance-group fairness of multi-party learning as a constrained multi-objective optimization problem. We prove the existence of an optimal model and design a constrained optimization method that achieves performance fairness and group fairness across institutions. Compared to baseline algorithms, our approach shows more consistent performance across institutions and lower algorithmic discrimination.? We present a multi-modal fair learning algorithm based on Pareto optimization. Real-world data often consists of incomplete, heterogeneous, multi-modal data. We aim to reconstruct the missing modalities through joint analysis of all available heterogeneous modal data. To address the challenges of statistical heterogeneity and modality heterogeneity in multi-modal learning, we find the optimal transformations between visible and missing modalities, achieving more equitable and consistent modality completion across multiple institutions.