登录 EN

添加临时用户

基于小样本的无监督字体风格化算法研究

Research on Few-shot Unsupervised Font Stylization Algorithm

作者:张玉君
  • 学号
    2020******
  • 学位
    硕士
  • 电子邮箱
    zyj******com
  • 答辩日期
    2023.05.20
  • 导师
    张慧
  • 学科名
    软件工程
  • 页码
    88
  • 保密级别
    公开
  • 培养单位
    410 软件学院
  • 中文关键词
    深度学习,无监督,字体风格分类,风格迁移,少样本
  • 英文关键词
    Deep Learning,Unsupervised,Font Style Classification,Style Transfer,Few-shot

摘要

字体是设计中的核心概念之一,因此自动鉴别字体类别从而寻找并利用特定字体进行创作或是为某种风格的字体自动构建完整的字体库一直是很多设计师的目标。同时,字体风格化的研究在实际生活中也具有重要的应用价值,例如犯罪笔迹的鉴别或是古迹复刻等。然而,关于字体风格的研究一直是一个具有挑战性的问题。现有的字体研究方法通常使用监督学习,需要大量的标签或是配对数据,而大量数据的收集成本很高。因此本文旨在研究小样本下的无监督字体风格化算法,通过降低获取数据的成本完成对字体风格分类模型和字体风格迁移模型的训练。在基于小样本无监督的字体风格分类算法研究中,本文提出了一种新的多分支字体风格分类网络(FontNet),以无监督的方式在只有少量训练样本的情况下进行分类。现有的图像分类网络迁移到字体分类任务上时遇到的瓶颈之一是分类字体风格时网络对特征轮廓的过度依赖。因此,本文提出了一种新的行列混洗打乱方法(LineShuffle),在破坏特征全局轮廓的同时保留笔画结构信息。由于某些语言只包含很少的字符,网络需要解决训练集较小时可能出现的过拟合问题。本文提出字符细节注意力裁剪(FPA)模块在无监督的条件下提取最具风格化的字符部分,学习字符的局部风格。同时, FPA 模块也可以提升数据样本的丰富性。本文进行了实验来验证 FontNet 的有效性和泛化性。仅依靠少量训练样本,本文的方法可以达到约 77% 的准确率,优于现有的字体风格分类方法和图像分类方法。更进一步地,本文将字体风格分类器迁移到字体风格转换任务上作为风格编码器。本文提出了一种编码器-解码器结构的小样本字体风格迁移模型 CDGFont,以无监督的方式学习,仅依靠一张参考风格样本实现为源字符图像生成新风格的目标。本文在 FontNet 模型的基础上进行改动,使字体风格分类模型适配于风格特征编码任务,为后续的生成器提供多层次的风格特征。在特征融合生成器中,利用所得到的多层次风格特征与内容特征进行级联,缓解生成结果中可能出现的丢失细节或存在伪影等问题。同时,提出了新的基于条件熵的内容相似性损失函数,以更加合理的方式约束网络对于特征的学习。消融实验验证了多尺度特征级联对字体风格迁移的有效性,以及基于条件熵的内容相似性损失函数的正确性。

Font is one of the core concepts in design, and therefore, automatic identification of font types from images and utilizing specific fonts for creation, or automatically constructing complete font libraries for a certain style, has always been a goal for many designers. Additionally, research on font stylization also holds significant practical value in various applications, such as identifying criminal handwriting or replicating historical monuments. However, font stylization has always been a challenging problem. Existing methods for font research usually rely on supervised learning, which requires a large number of labeled or paired data, leading to high data collection costs. Therefore, this thesis aims to study the unsupervised font stylization algorithm in small samples to reduce the cost of data acquisition and complete the training of font style classification models and font style transfer models.In the research of few-shot unsupervised font classification algorithms, a new multi-branch font style classification network called FontNet is proposed in this thesis. It classifies fonts with only a few training samples in an unsupervised manner. One of the bottlenecks faced when transferring existing image classification networks to font classification tasks is the excessive reliance on feature contours for font classification. To address this issue, a new row-column shuffling method called LineShuffle is proposed to disrupt the global contour features while preserving stroke structure information. Since some languages only contain very few characters, the network needs to solve the overfitting problem that may occur when the training set is small. This thesis proposes the Font Partial Attention (FPA) module to extract the most stylized character parts and enhance the richness of data while learning the local style of characters in an unsupervised manner.Experimental results validate the effectiveness and generalization ability of FontNet. The proposed method achieves an accuracy of approximately 77%, outperforming existing font style classification and image classification methods, even with only a few training samples.Furthermore, this thesis extends the application of the font style classifier to the font style conversion task as a style encoder. A small sample font style transfer model, CDGFont, with an encoder-decoder structure is proposed in an unsupervised manner. It relies on only one reference style sample to generate a new style for the source character image.The FontNet model is modified in this thesis to adapt the font style classification model for the style feature encoding task, providing multi-level style features for the subsequent generator. In the feature fusion generator, the obtained multi-level style features are cascaded with content features to alleviate problems of detail loss or existence of artifacts in the generated results. At the same time, a new content similarity loss function based on conditional entropy is proposed to constrain the network’s learning of features in a more reasonable way. Ablation experiments verify the effectiveness of multi-scale feature concatenation in font style transfer and the correctness of the content similarity loss function based on conditional entropy.