道岔和转辙机是重要的铁路线路基础设备,监视并准确掌握其运行状态对铁路安全运行十分重要。近年来,随着我国高速铁路迅速发展,积累的铁路运行监测数据越来越多,但其中大多数是未标注的历史数据,这给传统依靠人工数据分析和基于监督学习的设备状态监测方法提出了挑战。本文针对北京通号公司提供的无标签S700K转辙机A相动作电流曲线数据集,研究了转辙机电流曲线的模式挖掘、异常检测和模式分类方法。论文主要工作如下:(1)论文首先根据转辙机电流曲线的机理知识,按电流曲线长度将数据集划分为样本规模较小的超短曲线数据集和短曲线数据集、样本规模巨大的常规长度曲线数据集、以及故障模式明确的超长曲线数据集。然后,通过对比不同的特征提取方法,针对不同样本规模的电流曲线数据集分别给出了相应的模式挖掘方法。针对超短曲线和短曲线数据集,给出了基于t分布随机邻域嵌入(t-SNE)特征提取和DBSCAN聚类分析的模式挖掘方法;而针对常规长度曲线数据集,则给出了基于深度降噪自编码器(DDAE)特征提取和DBSCAN聚类分析的模式挖掘方法。利用上述模式挖掘方法,论文在超短曲线和短曲线数据集上取得了良好的效果,而在常规长度曲线数据集上的结果还有提高的空间。(2)针对在常规长度曲线数据集上采用基于DDAE提取特征和DBSCAN聚类分析的方法存在的不足,提出了基于inlier-outlier划分的迭代DBSCAN聚类方案,有效提高了常规长度曲线数据集的模式挖掘效果。结合转辙机机理知识,归纳总结了在超短曲线数据集、短曲线数据集和常规长度曲线数据集中所挖掘出的电流曲线模式。(3)基于前面的历史数据模式挖掘结果,设计了基于DDAE的异常检测算法,以及基于Softmax回归和基于一维卷积网络的异常模式分类算法。算法在常规长度曲线数据集上取得了理想的效果。虽然本文的算法以转辙机电流曲线为研究对象,但可以应用到以转辙机的功率、推力等信号曲线为对象的模式挖掘、检测和分类任务中。
Turnout and point machines are important infrastructure equipment for railway lines, so the monitoring of their operating conditions is very important for guaranteeing the safety of railways. In recent years, with the rapid development of high-speed railways, more and more monitoring data of railway operations have been being accumulated, among which, most are unlabeled historical data. This poses a challenge to traditional methods for equipment condition monitoring based on manually data analyzing and supervised learning. Based on the unlabeled dataset of the A-phase operating current curves of point machines offered by Beijing National Railway Research & Design Institute of Signal & Communication Ltd., this thesis focuses on the pattern mining, anomaly detection and anomaly pattern classification of the current curves. The main contribution of the thesis includes:(1) Based on the first-principle analysis of the current curves of the point machines, we first divide the dataset into the ultra-short curve dataset and the short curve dataset (both of which has small sample size), the normal length curve dataset with large sample size, and the ultra-long curve dataset which can be directly identified as faulty. Then, by comparing different feature extraction methods, the corresponding pattern mining methods are given for the above mentioned datasets respectively. For the ultra-short curve dataset and the short curve dataset, a pattern mining method using t-SNE (t-Distributed Stochastic Neighbor Embedding) based feature extraction and DBSCAN (Density-Based Spatial Clustering of Applications with Noise) based clustering is given. For the normal length curve dataset, a pattern mining method using the DDAE (Deep Denoising Auto-Encoders) based feature extraction and the DBSCAN based clustering is given. Using the above pattern mining methods, the thesis has achieved good results on the ultra-short curve dataset and short curve dataset, but there is still room for improvement on the results of the normal length curve dataset.(2) For the demerit of the DDAE and DBSCAN based methods for the normal length curve dataset, an iterative DBSCAN clustering scheme based on inlier-outlier partitioning is proposed, which effectively improves the effect of pattern mining of the normal length curve dataset. Using the first-principle analysis of point machines, the current curve patterns mined from the ultra-short curve dataset, the short curve dataset and the normal length curve dataset are summarized.(3) Based on the pattern mining results given by (1) and (2), a DDAE based anomaly detection algorithm to detect abnormal samples, and then a Softmax Regression based classifier as well as a One-dimensional Convolutional Neural Networks based classifier are designed to further classify abnormal patterns. The algorithms have achieved ideal results on the normal length curve dataset.Although the algorithms of the thesis focus on the current curves, they can also be applied to the pattern mining, anomaly detection and pattern classification of power or thrust signal curves of the point machines.