登录 EN

添加临时用户

机器学习中基于轨迹的泛化分析:从信号与噪声的视角

Trajectory-based Generalization Analysis in Machine Learning: From a Signal and Noise Perspective

作者:滕佳烨
  • 学号
    2020******
  • 学位
    博士
  • 电子邮箱
    ado******com
  • 答辩日期
    2024.05.23
  • 导师
    袁洋
  • 学科名
    计算机科学与技术
  • 页码
    153
  • 保密级别
    公开
  • 培养单位
    047 交叉信息院
  • 中文关键词
    泛化理论;机器学习;信号;噪声;优化轨迹
  • 英文关键词
    generalization analysis; algorithmic stability; optimization trajectory; signal and noise

摘要

理解机器学习模型的泛化能力对于确保其在航空航天、金融和医疗等关键领域的可靠性至关重要。传统的泛化理论多依赖于复杂性度量,如 VC 维和 Rademacher 复杂度。然而,这些方法在解释复杂神经网络的泛化表现时往往显得力不从心。故而在传统理论方法之外,算法稳定性的概念应运而生,旨在利用算法对训练数据的依赖程度来预测其泛化能力。虽然算法稳定性在凸训练下表现良好,但在以神经网络训练为代表的非凸训练中仍面临诸多挑战。在本文中,我们提出利用优化轨迹作为算法信息以提升泛化分析。具体而言,我们将优化轨迹中关于信号拟合和噪声拟合的性质融入泛化理论中。我们的研究表明,对优化轨迹的精细分析可以提升泛化界、找到有效的泛化度量、以及探索超参数与泛化轨迹之间的复杂关系。本文的主要贡献包括:1. 从思想方面来看,本文深入探讨了优化轨迹在泛化分析中的作用。基于优化轨迹对拟合信号与噪声的不同机制,本文提供了将算法信息融入泛化分析的新角度。2. 从理论方面来看,本文(1)提出了基于信噪拆分的新泛化界,并证明了在高信噪比条件下其优于现有方法;(2)提出了基于信噪相对拟合速度的新泛化度量,并通过理论分析支持其有效性;(3)分析了模型过参数化程度对标签噪声敏感度及泛化误差曲线的影响。 3. 从实验方面来看,通过在合成数据集及 CIFAR-10 和 ImageNet 等真实数据集上的实验,我们验证了理论分析的准确性,并展示了优化轨迹分析在泛化能力提升方面的实际应用潜力。

Understanding generalization in machine learning is crucial for enhancing reliability across various domains such as aerospace, finance, and medicine. Traditional approaches to generalization rely on complexity metrics like VC dimension and Rademacher Complexity, yet they often fall short in explaining the remarkable generalization performance of complex neural networks. As an alternative, researchers explore the concept of algorithmic stability to analyze training dynamics and derive algorithm-dependent generalization bounds. While it guarantees generalization in convex training scenarios under some conditions, it faces challenges in the non-convex landscapes of neural networks.In this thesis, we focus on better leveraging algorithmic information for generalization analysis. Specifically, we examine the trajectory properties during the optimization process associated with fitting signal and noise. Our work demonstrates that by finely analyzing these optimization trajectories, one can (a) enhance generalization bounds, (b) find effective generalization measures, and (c) explore the relationships between hyperparameters and generalization. Our contributions are summarized as follows.1. We highlight the role of trajectory-based insights in generalization analysis, which reveals distinct mechanisms for fitting signals and noise, and offers a novel approach to incorporating algorithmic information into generalization analysis. 2. Theoretical contributions include (a) a new generalization bound based on a decomposition framework with fine-grained signal and noise analysis, outperforming existing bounds; (b) novel generalization measures based on relative fitting speeds of signal and noise, with theoretical insights; and (c) insights into how varying levels of overparameterization influence a model's resistance to label noise, leading to different generalization curves during the trajectory.3. We conduct experiments on synthetic and real-world datasets like CIFAR-10 and ImageNet, confirming the improvement in generalization analysis achieved by leveraging the trajectory property.