宽神经网络的收敛性

Convergence of Wide Neural Networks

作者：於子雄

学号

2019******
学位

博士
电子邮箱

yuz******.cn
答辩日期

2024.05.17
导师

颜文斌
学科名

数学
页码

105
保密级别

公开
培养单位

042 数学系

中文关键词

神经网络;神经网络核;神经正切核;收敛性;泛化能力

英文关键词

Neural Network; Neural Network Kernel; Neural Tangent Kernel; Convergence; Generalization Ability

摘要

在当今的生产与科研领域，神经网络凭借其卓越的性能已在多个方面取得了显著的进展。然而，尽管其应用广泛且成效显著，但神经网络的工作机制仍然像是一个神秘的“黑盒”，我们对其理解相当有限。为了深入探究神经网络的泛化性能，研究者们经常利用神经正切核（NTK）作为研究工具。但令人遗憾的是，以往的研究在宽神经网络核一致收敛性的证明上存在缺失，这导致了使用NTK研究神经网络泛化能力的课题缺乏严谨性。本文致力于填补这一研究空白，并围绕宽神经网络核的一致收敛性进行了深入的探讨。首先，在回归问题中，我们证明了在深度固定的情况下，无论是全连接网络还是残差网络，宽神经网络核都具备一致收敛性。并且借助上述一致收敛性，我们得到了宽神经网络和神经正切核回归泛化能力之间的接近性。这为我们利用神经正切核回归的泛化能力来研究神经网络提供了理论支持，也补上了先前诸多研究论证中的关键一环。基于这一收敛性结果，我们进一步分析了宽神经网络在不同情形下的泛化能力。我们发现当目标函数位于与NTK相关联的再生核Hilbert空间中时，多层宽神经网络能够达到最优的泛化性能。同时，我们也指出过度拟合的多层神经网络可能会导致泛化性能的下降。除此之外，利用一般区域的NTK谱特性，我们将有关结果推广到了无界区域。在以上研究的基础上，我们进一步将问题拓展到分类情形中。我们发现，在有限的训练时间内，分类问题的宽神经网络核仍然能够一致收敛到对应的神经正切核。然而，当训练时间趋于无穷时，特定网络结构下的收敛性则无法得到保证。最后，我们简要探讨了当网络深度趋于无穷时神经正切核的行为，并通过理论分析和实验结果指出，在比例因子随深度快速衰减的情况下，残差网络的效果可能会更佳。

In today's production and research fields, neural networks, with their outstanding performance, have made significant advances in multiple areas. However, despite their widespread application and remarkable success, the working mechanism of neural networks still remains a mysterious "black box" with our understanding being quite limited. To delve deeper into the generalization performance of neural networks, researchers often use the Neural Tangent Kernel (NTK) as a research tool. Unfortunately, previous studies have lacked rigorous proofs of uniform convergence for wide neural network kernels, leading to a lack of rigor in using NTK to study neural network generalization ability.This paper aims to fill this research gap and deeply discusses the uniform convergence of wide neural network kernels. First, in regression problems, we demonstrate that both fully connected and residual networks possess uniform convergence with fixed depth. Moreover, leveraging the aforementioned uniform convergence, we have drawn a connection between the generalization ability of wide neural networks and neural tangent kernel regression. This provides theoretical support for studying neural networks using the generalization capability of the NTK and fills a critical gap in many previous research arguments.Building on these convergence results, we further analyze the generalization ability of wide neural networks under different scenarios. We find that when the target function resides in the reproducing kernel Hilbert space associated with the NTK, multilayer wide neural networks can achieve optimal generalization performance. At the same time, we also highlight that overfitting in multilayer neural networks might lead to a decline in generalization performance. Additionally, using the spectral properties of the NTK in general regions, we have extended the above results to unbounded areas.Expanding on the above research, we further extend the discussion to classification scenarios. We find that within a finite training duration, the kernels of wide neural networks for classification can still uniformly converge to the corresponding neural tangent kernel. However, when training time approaches infinity, convergence under certain network structures cannot be ensured. Finally, we briefly explore the behavior of the neural tangent kernel as network depth approaches infinity and, through theoretical analysis and experimental results, suggest that residual networks might yield better performance when the scaling factor decreases rapidly with depth.

概览页

宽神经网络的收敛性

Convergence of Wide Neural Networks

摘要

请选择登录入口

添加临时用户

概览页

宽神经网络的收敛性

Convergence of Wide Neural Networks

摘要

国内学位论文

国外学位论文

请选择登录入口