登录 EN

添加临时用户

面向低质量图像的场景文字识别系统

Low Quality Images-Oriented Scene Text Recognition System

作者:石浩东
  • 学号
    2020******
  • 学位
    硕士
  • 电子邮箱
    shd******.cn
  • 答辩日期
    2023.05.17
  • 导师
    彭良瑞
  • 学科名
    电子信息
  • 页码
    54
  • 保密级别
    公开
  • 培养单位
    023 电子系
  • 中文关键词
    场景文字识别,低质量图像,前景掩膜,字符轮廓,跨平台软件开发包
  • 英文关键词
    scene text recognition,low quality image,foreground mask,character boundary,cross-platform software development kit

摘要

近年来,随着机器人、自动驾驶汽车和移动计算等相关需求的日益增长,场景文字识别日渐成为一个重要的研究课题。目前基于深度学习的方法对清晰度高、文字样式规范的图像有较强的识别能力,但对图像畸变模糊,分辨率低的低质量场景文字图像识别能力有限。本论文旨在通过改进现有的场景文字识别技术,在此基础上开发一套高效的面向低质量图像的文字识别系统。通过利用场景文字图像的前景掩膜和字符轮廓,论文提出了一种新的场景文字识别算法,其主要内容和创新点如下:第一,论文提出了基于前景掩膜和字符轮廓预测的多任务学习的方法,在传统的基于深度学习的场景文字识别框架上,通过在原有的卷积神经网络中添加并行的前景掩膜预测分支和字符轮廓预测分支,论文设计了一个新的场景文本识别框架,包括基本的序列建模任务以及额外的前景掩膜和字符轮廓预测任务,通过引入额外的前景掩膜和字符轮廓信息,增加模型对低质量图像识别的能力。第二,论文提出了一种基于特征图空间位置的注意力机制,预测出的前景掩膜和字符轮廓被用作注意力权重与经过卷积神经网络得到的特征图进行加权之后,再送到后续的序列建模模块中进行识别,让模型注重图像的前景和轮廓部分。第三,论文提出了一种通过引入分支短接实现前景掩膜预测分支和字符轮廓预测分支融合的机制,实现更好的前景掩膜和字符轮廓提取。在训练的过程中,针对常用的场景文字数据集缺少前景掩膜和字符轮廓真值的问题,论文提出了生成伪标签和利用卷积神经网络提取前景掩膜和字符轮廓两种方式来进行监督学习。在七个常用的英文场景文本识别数据集上的实验结果表明,论文的结合前景掩膜和字符轮廓的场景文字识别方法优于现有方法。在系统研制方面,论文设计实现了基于多任务学习的文本行图像识别技术与系统,并进一步开发了跨平台的文字识别软件开发包SDK,测试结果达到了原定的技术指标要求。

In recent years, with the increasing needs of robots, autonomous vehicles and mobile computing, scene text recognition has become an important research topic. At present, deep learning based method has a strong recognition ability for images with high resolution and standardized text style, but it has a limited ability to recognize low-quality scene text images with blurred image distortion and low resolution. The thesis aims to develop an efficient text recognition system for low-quality images by improving existing scene text recognition techniques.By using foreground mask and character boundary of scene text images, the thesis proposes a new scene text recognition algorithm, whose main content and innovations are as follows:Firstly, the thesis proposes multi-task learning based on foreground mask and character boundary prediction. On the original scene text recognition framework based on deep learning, a new scene text recognition framework is designed by adding parallel foreground mask prediction branch and character boundary prediction branch in the original convolutional neural network to increase the ability of model to recognize low quality images, which includes a basic sequence modeling task as well as an additional foreground mask prediction task and an character boundary prediction task.Secondly, the thesis proposes an attention mechanism based on the spatial position of feature maps. The predicted foreground mask and character boundary are used as attention weights and weighted with the feature maps obtained by convolutional neural network to focus on foreground mask and character boundary of the image, which are then sent to the subsequent sequence modeling module for recognition.Thirdly, the thesis proposes a fusion mechanism of foreground mask prediction branch and character boundary prediction branch by introducing branch shortcuts to get better foreground mask and character boundary prediction.During the training process, due to the lack of foreground masks and character boundaries groundtruth in commonly used scene text datasets, the thesis proposes two methods: generating pseudo labels and using convolutional neural networks to extract foreground masks and character boundaries groundtruth to conduct supervised learning. The experimental results on seven commonly used English scene text recognition datasets show that the scene text recognition method combining foreground mask and character boundary proposed in the thesis is superior to current methods.In terms of system development, the thesis designs and realizes the text line image recognition technology and system based on multi-task learning, and further develops a cross-platform text recognition software development kit SDK. The test results meet the original technical index requirements.