登录 EN

添加临时用户

面向视觉目标识别的深度神经网络量化方法

Quantization Methods of Deep Neural Networks for Visual Object Recognitio

作者:王子为
  • 学号
    2018******
  • 学位
    博士
  • 电子邮箱
    wan******.cn
  • 答辩日期
    2023.05.21
  • 导师
    鲁继文
  • 学科名
    控制科学与工程
  • 页码
    129
  • 保密级别
    公开
  • 培养单位
    025 自动化系
  • 中文关键词
    网络量化,视觉目标识别,通道相关性,信息瓶颈,精度预测
  • 英文关键词
    Network quantization, visual object recognition, channel-wise interaction, information bottleneck, precision prediction

摘要

深度神经网络虽然在众多计算机视觉任务中取得了优异的表现,但其面临存储空间大,计算速度慢的挑战,通常无法在智能手机、机器人等资源受限的平台部署。近年来,神经网络量化技术的提出降低了深度模型的存储和计算成本,大幅提升了其在嵌入式设备上的应用价值。然而,现有神经网络量化方法在视觉目标识别的实际应用中仍然存在任务性能与部署效率上的瓶颈,具体体现在:(1)二值卷积与实值卷积特征图中的像素正负符号不一致,网络结构中的量化噪声引起了严重的信息损失;(2)容量受限的量化模型中存在大量冗余信息,目标函数无法充分利用网络容量实现有效信息表示;(3)量化策略在面向不同部署场景时无法迁移,网络参数只能在数据分布恒定的单一场景下获得良好性能;(4)最优量化位宽搜索需评价大量量化策略以获得反馈,对量化网络进行精度估计的训练过程通常造成了严重的计算资源浪费。本文针对以上四个关键问题,分别从通道相关性挖掘、任务复杂度建模、空间注意力迁移、网络准确率预测四个方面展开研究,提出了面向视觉目标识别的深度神经网络量化方法,本文的主要创新成果和研究内容如下:1. 针对网络结构对量化噪声鲁棒性差的问题,提出基于网络激活修正的低损失神经网络量化方法。该方法利用强化学习挖掘卷积网络通道间隐式相关性,并设计基于通道间先验的二值特征图补偿运算,在保持二值网络计算与存储高效性的同时大幅降低了量化噪声引起的信息损失。2. 针对目标函数对网络容量挖掘不足的问题,提出基于可变信息瓶颈的低冗余神经网络量化。该方法将信息瓶颈理论拓展到量化网络推理过程以去除冗余信息,并根据不同视觉任务的复杂度实现信息瓶颈自适应调整,确保无信息冗余下的最大模型容量利用。3. 针对网络参数对部署环境通用性弱的问题,提出基于显著度序保持的高泛化神经网络量化方法。该方法通过约束量化网络与实值网络的空间注意力一致性学习可迁移的量化模型,实现在任意数据分布下的跨场景神经网络量化框架部署。4. 针对训练过程对计算资源消耗量大的问题,提出基于精度回归预测的低成本神经网络量化方法。该方法通过学习量化策略精度预测网络显著降低评估过程中的计算成本,并通过采样不确定量化策略以快速获得准确的精度预测网络,实现高效的混合精度量化模型训练。

Deep neural networks achieve promising performance in a wide variety of computer vision tasks while suffer from large storage space and slow computational speed, which prevent the deployment on resource-limited platforms such as smartphones and robots. In recent years, the emergence of network quantization techniques reduces the storage and computational cost of deep models and significantly enhances the practicality in embedded devices. However, there are still bottlenecks of task performance and deployment efficiency for existing neural network quantization methods in realistic applications of visual object recognition, mainly reflected in: (1) the pixel signs of feature maps in binary and full-precision convolution are inconsistent, and the quantization noise in the network architecture causes severe information loss; (2) a large quantity of redundant information exists in quantized models with limited capacity, and the objective function fails to fully utilize limited network capacity for information representation; (3) the quantization strategies are not transferable in different deployment scenarios, and the network parameters can only achieve satisfying performance in single scenes with identical data distribution; (4) optimal bitwidth search requires the evaluation of many quantization strategies for feedback acquisition, and training process for precision estimation of quantized networks usually results in significant computational resource waste. To mitigate the four above issues, we carry out research from four aspects including channel-wise interaction mining, task complexity modeling, spatial attention transferring, network precision prediction. We propose neural network quantization methods for visual object recognition. The following description details the main findings and innovations: 1. To mitigate the problem of weak robustness to quantization noise for network architectures, we propose low-loss network quantization methods via activation modification. The method utilizes reinforcement learning to mine the implicit channel-wise interactions in convolutional networks, and designs offset operations of binary feature maps according to the channel-wise priors. The information loss caused by quantization noise is significantly reduced while keeping the low storage cost and acceleration efficiency of binary networks. 2. To address the problem of utilization insufficiency of network capacity for objective functions, we propose low-redundancy network quantization based on variable information bottleneck. This method generalizes information bottleneck theory to quantized network inference for redundancy removal, and models the task complexity for adaptive adjustment of information bottleneck. The maximal model capacity utilization without information redundancy is ensured. 3. To alleviate the issue of weak generalization ability of deployment environments for network parameters, we present transferable network quantization methods via attribution rank preservation. This method enforces the attention consistency between the quantized and full-precision networks for transferable quantized model learning, and achieves the network quantization framework deployment across scenarios with arbitrary data distribution. 4. To reduce the high computational resource consumption in training process, we present the low-cost network quantization methods according to precision prediction. This method obviously reduces the computational cost of evaluation by learning precision prediction networks of quantization strategies, and samples uncertain quantization strategies to fast acquire accurate precision prediction networks in order to achieve efficient training for mixed-precision quantization.