随着以多媒体和互联网为代表的信息技术不断发展,越来越多的内容在网络上被广泛地创作和传播,人们开始更加注重自身的审美体验。利用人工智能技术自动化地做出审美决策或生成具有美感的作品,在生活娱乐和工业生产中都有广泛的应用前景。本文重点关注图像计算美学问题,从可计算性和实用性的角度出发,围绕图像美学质量评价和图像美学增强算法开展研究。 随着深度学习技术的进步,图像美学质量评价和增强领域得到了快速的发展。但美感是具有高层语义的一种主观感受,至今仍然是一个具有较大挑战性和开放性的研究热点。针对目前主流的图像美学质量评价方法中存在的难以提取关键性的全局美学表征、无法基于真实数据分布学习等问题,我们结合计算机视觉等领域的最新成果,提出有效通用的美学表征学习方法。此外,在美学增强方面,本文重点关注面向视频场景和个性化的图像色彩增强需求,提出了一系列的算法对现有技术进行改进和拓展。 总结起来,本文研究内容包括以下三个方面: 1.提出一种基于布局感知图卷积神经网络的算法。该算法通过在坐标系空间和隐空间中进行层次化图推理来实现全局信息捕捉,同时借助于图结构中的邻接矩阵,将输入图像的原始比例融合到节点的信息传递过程中,从而使模型能够基于真实数据分布来进行学习。实验表明,相比于已有方法,提出的方法可以学习到更有效的通用图像美学表征。 2.提出基于一种查找表的从图像迁移到视频的美学增强算法。该算法提出一种可用于实时视频增强的框架,仅使用图像增强数据集进行训练。本文算法利用虚拟物理引擎从静态图像中推测有效的运动场信息,并通过时间一致性约束让网络从中学习到帧间稳定性。大量实验和用户调研表明,相比于已有算法,本算法的视频增强更加流畅稳定,且在4K分辨率的视频处理上可以达到797FPS的效率。 3.提出一种基于条件可逆网络的多风格美学增强算法。该算法通过可逆网络构建了风格分布空间和简单分布间的双向映射来学习多风格增强中一对多的复杂映射。本文算法使用可逆风格向量编码模块对风格进行编码,通过条件增强模块实现实时增强。实验表明,提出的方法实现了多样化的多风格增强,构建的风格空间具备连续性,支持友好的用户编辑。
With the development of information technology, more and more content is widely created and spread on the network. People begin to pay more attention to their own aesthetic experience. Automatically making aesthetic decisions or generate aesthetic works with AI has a wide application prospect in entertainment and industrial production. This paper focuses on the problem of computational aesthetics. This paper focuses on the image aesthetics assessment and image enhancement for studying the computability and practicability of aesthetics. In recent years, the research on image aesthetics has been developed rapidly. But the aesthetics is a subjective feeling with high-level semantics, it is still a challenging and open research problem. There exists limitations on capturing the relations among distant regions and optimizing networks based on the real data distributions for the CNN-based methods. To address the problem, this papers propose a framework for learning robust image aesthetic representations with new technical solutions. For the enhancement task, this paper focuses on video scene oriented and personalized image color enhancement requirements, and presents a series of algorithms to improve and expand the existing method.In summary, the research content of this paper includes the following three aspects:1. This paper proposes an algorithm based on layout-aware graph convolutional neural network. The algorithm realizes global information capture through hierarchical graph reasoning in the coordinate space and hidden space. At the same time, with the help of the adjacency matrix in the graph structure, the original ratio information of the input image is fused into the information transfer process of the node so that the model can learn based on the real data distribution. Experiments show that compared with the existing methods, the proposed method can learn more effective general image aesthetic representation.2. This paper proposes an image-to-video aesthetic enhancement algorithm based on the Look-up Table. This algorithm proposes a method for real-time video enhancement which can be trained on the image dataset while testing on the video. In this paper, the virtual physics engine is applied to infer the motion field information from the static image. The proposed network learns from the inter-frame stability through the temporal consistency constraint. Quantitative experiments and user study show that compared with the existing algorithms, the proposed algorithm is able to generate more smooth and stable results. The speed of processing 4K resolution video reaches 797 FPS.3. This paper proposes a multi-style aesthetic enhancement algorithm based on conditional Invertible Neural Network. This algorithm uses INN to construct a bidirectional mapping between style distribution space and simple distribution to learn the complex mapping of multi-style enhancement in one-to-many way. In this paper, the invertible style vector encoder is used to encode the style, and the conditional enhancement module is used for real-time enhancement. Experiments show that the proposed method achieves diverse multi-style enhancement, the constructed style space has good continuity, and it‘s editing is user-friendly.