随着互联网信息的爆炸式增长,从海量、多源、异质、质量参差不齐的数据中提取结构化语义知识的需求急剧提升。知识图谱以概念、实体及其关系的网络形式,实现对客观世界从字符串描述到结构化语义描述的跃迁,是人工智能领域的研究热点。跨语言知识图谱通过语义等价关系对不同语言的知识进行语义融合,进行不同语言、不同层次和不同粒度的精准化语义建模,是知识图谱构建的重要方向,具有促进知识共享全球化、丰富知识储备、提升知识精度等意义。 基于Wiki知识资源构建跨语言知识图谱是当前的主要方法,语义关系发现是其核心内容,包括语义等价关系发现、上下位语义关系识别以及语义链接关系预测。本文围绕现有跨语言知识图谱构建中语义等价关系规模不足、上下位语义关系存在噪声以及语义链接关系大量缺失的问题展开研究: 在语义等价关系发现方面,针对语言鸿沟导致的相似度特征稀疏和扩展性差的问题,提出基于异构网络表示学习的语义等价关系发现方法,将跨语言实体统一表示到低维向量空间以进行相似度度量,对文本上下文、引用链接和语义链接进行多信息融合的深层表示有效提升了语义等价关系发现的性能,尤其提高了语义等价关系的召回效果。 在上下位语义关系识别方面,针对上下位相关度特征语言扩展性差的问题,提出基于跨语言知识校验的上下位语义关系识别方法,探究更多的启发式特征和结构化特征,基于跨语言学习的思路,提出动态自适应增强学习模型以促使不同语言上下位语义关系识别性能的增量式相互提高,使用跨语言知识校验的方法避免了增量学习中潜在的性能恶化问题。 在语义链接关系预测方面,针对传统关系抽取方法面临数据稀疏的问题,提出基于知识图谱表示学习的语义链接关系预测方法,在表示学习中引入文本上下文信息扩充了知识图谱的语义信息,解决了现有表示学习方法效果受知识图谱结构稀疏制约的问题,同时模型对同一关系的多向量表示能力大幅度提升了复杂语义链接关系的预测能力。 基于以上研究成果,进行应用示范验证,提出并构建第一个中英文知识平衡的大规模开放知识平台XLORE,提供面向普通用户、语义网开发者以及终端应用的多种知识共享接口。
With the explosive growth of Internet information, the data are becoming massive, multi-sourced, heterogeneous, and of different quality. Knowledge graphs are proposed to meet the dramatically increasing demands of extracting structural semantic knowledge. In the form of networks consisted of concepts, entities and their relations, knowledge graphs make it possible to change the description of the objective world from traditional strings to structural semantics, and become one of the research hotspots in the field of Artificial Intelligence. Cross-lingual knowledge graphs, which use semantic equivalence relations to integrate the knowledge graphs in different languages, describe the world in a precise semantic model of different languages, levels, and granularities. Constructing the cross-lingual knowledge graphs is an important and promising direction of knowledge graph construction, with the advantages of promoting the global knowledge sharing, enriching the semantic triples and enhancing the knowledge accuracy. Most of the current methods construct the cross-lingual knowledge graphs via semantifying the large-scale wiki knowledge resources, where the semantic relation discovery is at the core of such methods. The main research tasks focus on discovering semantic equivalence relations, recognizing semantic is-a relations, and predicting semantic linking relations. This paper focuses on the current problems in semantic relation discovery, including the incompletion of semantic equivalence relations, the noise of semantic is-a relations, and the missing of semantic linking relations: Regarding the discovery of semantic equivalence relations, we propose a method based on representation learning of heterogeneous networks, to address the sparseness and poor scalability of traditional similarity features caused by cross-language gap. Our method represents the cross-lingual entities in a consistent low-dimensional vector space, where the deep representation of textual, linkage, and semantic information successfully improves the performance of semantic equivalence relation discovery, especially the performance in recall. Regarding the recognition of semantic is-a relations, we propose a cross-lingual knowledge validation based method, to address the low scalability of current semantic relatedness features. Based on more heuristic and structural features, we propose a dynamic adaptive boosting model to improve the recognition performance across different languages, where the cross-lingual knowledge validation successfully avoids the potential performance degradation in the boosting process. Regarding the prediction of semantic linking relations, we propose a knowledge graph representation learning method, to solve the feature sparseness of traditional relation extraction methods. We incorporate the textual contexts to each entity and relation, which greatly expands the semantic structure of the knowledge graph. Besides, the ability of enabling each relation to have multiple different representations greatly improves the prediction performance for complex semantic linking relations. Finally, a bilingual open knowledge sharing system named XLORE is developed based on the above research results. To the best of our knowledge, XLORE is the first system to provide balanced and large amount of Chinese-English knowledge. We have developed different knowledge sharing APIs for common users, Semantic Web developers, and applications.