图像压缩是计算机视觉研究领域的一个经典问题,其在视频播放、社交媒体、图像存储、信息传输等领域有着重要的应用,图像压缩旨在降低图像视频的传输、存储成本,使得单位空间内传输、存储的内容更多,有利于信息的高效传输、存储。而图像无损压缩指的是对图像进行没有损失的压缩,图像在压缩前和解压后的质量不变,也即每个像素的值均保持不变,图像无损压缩在医学影像、高清地理信息地图上有着重要的应用。近年来,随着深度学习的发展,以深度生成模型为主的图像无损压缩算法也日渐繁荣,在学术界也备受关注。深度生成模型能够对图像全局进行概率建模,预测单像素概率,结合熵编码器可实现无损压缩。但是,现有的算法只在压缩率上取得了提高,在速度上并不能达到实用的效果。目前主流的基于生成模型的图像无损压缩算法,吞吐量只在 1 MB/s 左右,而实用算法的压缩吞吐量需要在 100 MB/s 以上。因此,研究如何将基于生成模型的图像无损压缩算法实用化是一个十分关键的问题,高效的算法对产业应用具有十分重要的意义。本文从深度生成模型轻量化架构和高效熵编码器设计两个角度,针对基于生成模型的图像无损压缩算法实用化进行研究,一方面轻量化模型有助于概率建模速度提升,另一方面高效熵编码器可在图形计算单元进行推理,进一步提升速度。本文主要贡献如下:? 针对深度生成模型在推理时间上的瓶颈,本文设计了一套残差预测网络,通过预测残差分布,减轻模型的复杂度,提升推理吞吐量。同时该框架结合自回归模型和向量量化的变分自编码器两类生成模型的优点,取得了压缩率和压缩吞吐量的平衡。本文所提的模型相比之前模型的吞吐量提升一个数量级,能够在真实场景部署应用。? 针对无损压缩中动态熵编码器的速度瓶颈,本文设计了一套基于非对称数字系统的编码算法,通过对中间结果预先计算存表,使得编码器的压缩解压吞吐量进一步提高。同时算法针对图形计算单元进行优化,减少了大量数据在处理器和图形计算单元的传输时间,进一步提升了整套框架的吞吐量。
Image compression is a classic problem in the field of computer vision research. It has important applications in the fields of video playback, social media, image storage,information transmission, etc. Image compression aims to reduce the transmission and storage costs of image and video, so that less transmission space is needed for more content, which is conducive to the efficient transmission and storage of information. Image lossless compression refers to the compression that does not lose the image quality. The quality of the image before and after decompression remains unchanged, that is, the value of each pixel remains unchanged. There are important applications on medical images and high-definition geographical information maps. In recent years, with the development of deep learning, lossless image compression algorithm based on deep generative model is also increasingly prosperous, and has also attracted much attention in the academic community. The deep generative model can model the probability of the whole image, predict the probability of a single pixel, and combine with the entropy encoder to achieve lossless compression. However, the existing algorithm has only improved the compression rate and cannot achieve practical results in terms of speed. At present, the mainstream deep generative model based lossless compression algorithm ’s throughput is only about 1 MB/s, and the compressed throughput of practical algorithms needs to be above 100 MB/s. Therefore, it is a key issue to study how to make the image lossless compression algorithm based on the generative model practical, and the efficient algorithms are of great significance for industrial applications. In this paper, the practicabilityof image lossless compression algorithm based on generative model is studied from two perspectives: the lightweight architecture of deep generative model and the design of efficient entropy coder. On the one hand, the lightweight model helps to improve the speed of probability modeling, on the other hand, the efficient entropy coder can further improve the speed by inference in the graphics computing unit. The main contributions of thispaper are as follows:? Aiming at the bottleneck of deep generative model in inference time, our paper designs a set of residual prediction network, which can reduce the complexity of the model and improve the inference throughput by predicting the residual distribution.At the same time, the framework combines the advantages of the auto-regressive model and the vector-quantized variational auto-encoder to achieve the balance of compression rate and compression throughput. Compared with the previous model,the throughput of the model proposed in this paper is improved by an order of magnitude, and can deploy applications in real scenarios.? Aiming at the speed bottleneck of dynamic entropy coder in lossless compression, this paper designs a coding algorithm based on asymmetric numeral systems, which can further improve the compression and decompression throughput of the coder bypre-calculating and storing the intermediate results. At the same time, the algorithmcan run on the graphics computing unit, reduce the transmission time of a large amount of data in the processor and graphics computing unit, and further improve the throughput of the whole framework.