X光安检机能快速扫描行李中物体的影像,被广泛部署于各种公共场所来保障公共安全。然而,目前的安检体系过于依赖安检人员的主观判断,容易出现漏检、误检等问题。因此,研究 X 光安检图像违禁品自动检测方法具有重要意义。近年来,基于深度学习的一般物体目标检测取得了较大的进展。但由于 X 光安检图像存在纹理信息较少、公开数据集中违禁品数据较少等特点,一般目标检测算法对 X 光安检图像的检测效果不理想,容易对小目标造成误检,难以准确识别违禁品的类别或对违禁品精确定位。针对 X 光安检图像自身的特点以及一般目标检测算法在 X 光安检图像中出现的问题,我们深入研究了基于深度学习的 X 光安检图像违禁品检测算法。本文的主要研究内容和创新点如下:首先,针对一般目标检测算法对 X 光安检图像中的小目标容易造成误检的问题,本文提出了物体尺度先验的 X 光违禁品检测算法。根据 X 光安检机的成像几何条件,可以通过 X 光图像中物体的面积计算实际物体在安检机传送带平面的投影面积,并建立各类违禁品投影面积的概率分布函数。分布函数一方面以损失函数的形式监督检测器的训练,另一方面又用来修正检测器的输出概率。此外,本文还提出了一种高效的负样本选择策略,用于提升困难负样本被选择的几率,并与物体尺度先验集成到两阶段检测目标检测框架中。其次,针对 X 光安检图像纹理少的特点,本文提出了基于材质厚度估计的双模态融合违禁品检测算法。双能量 X 光安检机生成的伪彩色图像隐式包含着物体的材质信息和厚度信息。通过 X 光安检图像中颜色的空间分布,可以得到无机物、有机物和混合物的空间分布。再结合光的衰减公式和 X 光图像的亮度信息,可以粗略估计出同种材质之间的相对厚度,得到厚度模态图像。在特征金字塔结构下,使用本文提出的主从注意力模块实现伪彩色模态和厚度模态的特征融合,并在融合特征中检测违禁品。最后,针对 X 光安检图像公开数据集中违禁品数据较少的特点,本文提出了具有多级中间层连接结构的跨模态知识蒸馏算法。该算法能够利用大规模自然光数据集中训练好的检测模型指导有限 X 光安检数据下违禁品检测模型的训练,从而在有限的违禁品数据下提高检测器的性能。此外,根据实际安检可能的检测网络轻量化需求,本文还提出了既跨模态又跨模型的知识蒸馏算法。
X-ray scanners are most commonly used to protect public safety, because of theirhigh scanning speed of imaging the objects inside baggage clearly. The existing security inspection system which relies too much on the subjective judgment of inspectors isprone to loopholes. Thus it is important to develop automated prohibited object detection methods in X-ray baggage images. The deep learning based general object detection methods have made great progress in recent years. However, they do not work well on X-ray security images due to the less texture of X-ray security images and the insufficient number of prohibited objects in X-ray security datasets. Small targets are prone to false detection with these methods. Besides, these methods can’t accurately classify or pinpoint the prohibited objects. To deal with the problems of general object detection methods in X-ray security images and the characteristics of X-ray security images, we make a research on prohibited object detection in X-ray security images based on deep learning. The main works and the corresponding novelties of this paper can be summarized as follows.Firstly, we propose a prohibited object detection method based on physical size prior, so as to eliminate the problem that general object detection methods are prone to false detection of small targets. we calculate an object’s projected area on the plane of the conveyor belt by counting the pixels it covers in X-ray images, based on the imaging geometry of X-ray scanners. Then the probability distribution functions of each class of the prohibited object can be constructed. On the one hand, the distribution functions are formulated as an extra loss function to supervise the training of the detector. On the other hand, they are used to normalize the detector’s output probabilities. Besides, we also propose an efficient negative sample selection strategy, which increases the probability that hard negative samples being selected. The proposed physical size prior and negative sample selection strategy are integrated into a two stage detection framework.Secondly, we propose a prohibited object detection method based on material thickness estimation and bimodal fusion, to deal with the less textured X-ray security images. The pseudo-color X-ray images generated by dual-energy X-ray scanners implicitly contain the material information and thickness information of the objects. Thus, the spatial distributions of inorganic, organic and mixtures can be inferred by classifying the pixels of X-ray images with colors. Then the thickness of each material can be estimated according to the attenuation formula of the X-ray and the brightness of the X-ray images. We fuse the thickness mode and the pseudo-color mode with the proposed master-slave attention module under the feature pyramid structure. Then the prohibited objects are detected with the fusion features.Finally, we propose a cross domain knowledge distilling method that has multi-immediate connections to relieve the insufficient number of prohibited objects in X-ray security datasets. Through this method, an X-ray prohibited object detection model is better guided to reconstruct the intermediate representations produced by a network pre-trained on the large natural light dataset, which surely improves the performance of the detector with limited X-ray security images. A novel proposal of distilling representations across both architectures and modalities is also studied for practical application in X-ray security inspection.