近年来,我国城市轨道交通规模迅速扩大,已居世界前列,但城轨企业普遍存在知识管理薄弱的问题。作为行业智库的城轨咨询企业A公司依托先发优势,针对城轨投资、建设及运营管理需求,提供全过程、多专业集成咨询服务,但受制于运营模式传统、人才短缺、知识碎片化等问题,A公司正面临可持续发展困境。因此有必要采用数字化手段,通过做好领域知识共享和沉淀工作,强化知识对于咨询公司发展的核心支撑作用。知识图谱可以将行业信息、数据以及关系聚合为领域知识,并使信息资源具有更强的可视化特征,易于被计算、理解以及评价。而线路作为城市轨道交通投资、建设、运营的对象和价值载体,可以最大限度地集成领域知识。本文基于对行业代表型咨询企业A公司的需求分析,以城市轨道交通线路为对象进行知识建模,通过人工标注形成城轨线路数据集,使用基于深度学习的实体关系联合抽取方法代替传统的流水线方法进行实体识别及关系抽取,经过实体对齐、知识存储等流程完成了城轨线路知识图谱的构建。论文的主要工作如下:(1)基于对A公司的业务需求分析,按照斯坦福的“七步法”,自上而下地构建了城市轨道交通线路知识本体,对本体涉及的概念、概念的属性以及概念之间的关系进行了定义,使用Protégé本体编辑工具完成本体构建并通过了一致性检验。(2)针对城轨公开招标等工程信息文本中存在的大量关系重叠现象,本文使用CasRel(层叠式指针标注)深度学习框架,研究了从城市轨道交通线路数据集中进行实体关系三元组联合抽取问题。并尝试对该算法框架进行了改进,证明优化后的算法可以提升数据处理效率。(3)面向知识图谱在A公司的应用问题,将抽取出的三元组进行实体对齐后存储于图数据库(Neo4j)中,完成了城轨线路知识图谱的构建工作。通过可视化应用,分析指出了城轨线路知识图谱对于A公司业务的赋能意义。
In recent years, the scale of urban rail transit has expanded rapidly and ranks among the top in the world. However, knowledge management is generally weak in urban rail enterprises. As an industrial think tank, urban rail consulting company A relies on its first-mover advantage and provides whole-process and multi-professional integrated consulting services to meet the needs of urban rail investment, construction and operation management. However, limited by the traditional operation mode, talent shortage, knowledge fragmentation and other problems, Company A is facing the dilemma of sustainable development. Therefore, it is necessary to adopt digital means to strengthen the core supporting role of knowledge for the development of consulting companies by doing a good job in domain knowledge sharing and precipitation. Knowledge graph can aggregate industry information, data and relationships into domain knowledge, and make information resources have stronger visual characteristics, which are easy to be calculated, understood and evaluated. As the object and value carrier of urban rail transit investment, construction and operation, line can integrate domain knowledge to the maximum extent.Based on the demand analysis of Company A, which is a representing consulting enterprise in the industry, this thesis conducts knowledge modeling on urban rail transit lines and forms a data set for urban rail transit lines through manual labeling. The traditional pipeline method for entity recognition and relationship extraction is replaced by the deep learning-based entity relationship joint extraction method. By going through processes such as entity alignment and knowledge storage, the construction of the knowledge graph for urban rail transit lines is completed. The main work of the thesis is as follows:(1) Based on the business needs analysis of Company A and the "seven-step method" of Stanford, the knowledge ontology of urban rail transit line is constructed from top to bottom, the concepts involved in the ontology, the attributes of the concepts and the relationships between the concepts are defined, and the ontology construction is completed by using Protégé ontology editing tool and the consistency test is passed.(2) In view of the large number of relationship overlapped in engineering information texts such as urban rail public bidding, this paper uses CasRel (Cascade Binary Tagging Framework for Relational Triple Extraction) deep learning framework to study entity relationship triplet joint extraction from urban rail transit line data set. An attempt is made to improve the algorithm by demonstrating that the optimized version can enhance data processing efficiency.(3) To address the problem of applying knowledge graphs in Company A, the extracted triplets were aligned and stored in a graph database (Neo4j). This process completed the construction of the knowledge graph for urban rail transit lines. By using visualization applications, the analysis indicated the significance of the urban rail transit line knowledge graph for empowering Company A‘s business.