登录 EN

添加临时用户

高效卷积神经网络的结构设计与优化

Designing and Optimizing Structures for Efficient Convolutional Neural Networks

作者:丁霄汉
  • 学号
    2017******
  • 学位
    博士
  • 电子邮箱
    sha******com
  • 答辩日期
    2022.05.19
  • 导师
    丁贵广
  • 学科名
    软件工程
  • 页码
    120
  • 保密级别
    公开
  • 培养单位
    410 软件学院
  • 中文关键词
    卷积神经网络, 主干模型, 架构设计, 模型压缩, 结构重参数化
  • 英文关键词
    Convolutional Neural Network, Backbone, Architectural Design, Model Compression, Structural Re-parameterization

摘要

当前图像分类、目标检测、语义分割等多种计算机视觉任务的主流方法是用卷积神经网络作为主干模型,从输入的图像或视频中提取特征,然后对特征进行不同的处理。所以,如果能通过设计更好的结构而提高卷积神经网络的精度和效率,就可以广泛惠及多种视觉任务。聚焦于高效卷积神经网络的结构设计与优化,本文从架构、组件和压缩方法三个方面入手。在实践中,这三个方面联系紧密:现实应用一般要求在一定的推理开销的约束下追求尽可能高的精度,所以开发者既可以应用一种新的网络架构,也可以使用新式组件来改进现有架构,也可以对一个精度更高也更大的模型进行压缩(如剪枝)以使之满足既定的效率约束。在架构层面,为了实现高并行度和大吞吐量,本文提出一种极简、高效的卷积神经网络基本架构:RepVGG。与当今多路径的主流模型架构范式不同,这一架构为单路径设计,没有任何分支结构,仅由3x3卷积构成。为提升这种极简架构的精度,本文提出一种称为“结构重参数化”的方法论,用以实现训练时的复杂模型到推理时的简单模型的等价转化。在结构重参数化的作用下,RepVGG可达到与最新的复杂模型相当的精度。在组件层面,为了在不改变模型的宏观架构和推理时结构的前提下提升其精度,本文提出一种强化的基本组件以替换常规卷积层:非对称卷积模块(ACB)。这一组件应用非对称卷积来增强常规卷积层,这些非对称卷积层可以等价合并到常规的卷积层中,故不引入任何推理开销。为了填补业界关于大尺寸卷积核设计的知识空白,重新发现卷积核尺寸这一设计维度的重要价值,本文提出一种基本组件:重参数化大卷积核模块(RepLKB)。这一组件采用超大尺寸卷积核和恒等短路链接等关键设计,显著提升了卷积神经网络在下游任务上的精度和效率,更新了卷积神经网络的设计范式——从堆叠大量小卷积核到使用少量大卷积核。在压缩方法层面,为了克服传统方法精度损失大、需要微调的缺点,本文提出一种基于优化过程的方法:向心随机梯度下降(Centripetal SGD)。这一方法修改梯度下降的更新规则以制造一种特殊的冗余模式,成功解决了深度复杂模型压缩中的受约束剪枝问题。进一步,为了实现高精度剪枝,受人脑记忆机制的启发,本文提出一种基于模型结构转换的剪枝方法:ResRep。应用结构重参数化,这一方法通过解耦通道剪枝过程中的“记忆”和“遗忘”机制取得了业内最佳的效果。

The mainstream solution to multiple computer vision tasks (e.g., image classification, objection detection and semantic segmentation) is using Convolutional Neural Network (CNN) as the backbone to extract features from images or videos and then processing the features in different ways. Therefore, improving the accuracy and efficiency of general CNN via designing better structures may benefit multiple vision tasks. For designing and optimizing structures for efficient CNN, we contribute from three aspects: the architectural design, novel components and model compression methods. In practice, such three aspects are closely related, since the real-world applications typically require us to pursue high performance under certain constraints of efficiency. Therefore, one may adopt a new architecture, use some novel components to improve an existing architecture, or compress (e.g., prune) a bigger and better-performing model into a smaller one to meet the constraints of efficiency. In terms of CNN architecture, to realize high degree of parallelism and high throughput, we propose RepVGG, which is extremely simple and efficient. Different from the modern mainstream multi-path architectures, RepVGG is single-path, has no branches and comprises only 3x3 convolutions. To improve the accuracy of such a simple architecture, we propose a novel methodology named Structural Re-parameterization to equivalently convert a training-time complicated model into an inference-time simple model. With Structural Re-parameterization, RepVGG can reach a comparable level of accuracy with the up-to-date complicated models. In terms of CNN components, to improve the performance of CNN without changing the overall architecture nor the inference-time structure, we propose a powerful CNN building block to replace ordinary convolutional layers: Asymmetric Convolution Block (ACB). ACB uses asymmetric convolutions to enhance regular convolutional layers, which can be equivalently merged into the regular layers, introducing no extra inference-time costs. To explore the design of large convolution kernels and rediscover the significance of kernel size, which is found to be a vital design dimension, we propose a building block: Re-parameterized Large Kernel Block (RepLKB). RepLKB adopts core design elements such as very large kernel sizes and identity shortcuts to significantly boost the accuracy and efficiency of CNN, especially on downstream tasks, which highlights a new CNN design paradigm of using a few large kernels rather than many small ones. In terms of model compression, to overcome the common drawbacks of traditional methods, such as significant accuracy drop and the need for finetuning, we propose Centripetal SGD, which is based on a custom optimization method. This method produces a special redundancy pattern for channel pruning via changing the update rule, which solves the problem of constrained filter pruning in deep models with complicated structures. Furthermore, to realize high-accuracy channel pruning, as inspired by the mechanisms of remembering and forgetting in human brain, we propose ResRep, which is a channel pruning method based on the transformation of model structures. With Structural Re-parameterization, this method achieves state-of-the-art results via decoupling the remembering and forgetting in the pruning process.