登录 EN

添加临时用户

小说人物网络的构建与分析——以《白鹿原》为例

作者:宋丽
  • 学号
    2019******
  • 学位
    博士
  • 电子邮箱
    son******com
  • 答辩日期
    2024.05.29
  • 导师
    刘颖
  • 学科名
    中国语言文学
  • 保密级别
    公开
  • 培养单位
    069 人文学院
  • 中文关键词
    人物网络;指代消解;说话人抽取;开放关系抽取;《白鹿原》

摘要

小说是最重要的文学体裁之一,在长久的发展下积累了庞大的规模,且通常篇幅很长,给文学研究和应用研究带来了巨大的挑战,仅仅依赖人力阅读已远远无法满足研究需求。构建人物网络可以将非结构化的小说文本所包含的复杂信息转换为结构化的网络形式,从而服务于小说的阅读、分析与应用,具有较大的研究意义。近年来,深度学习方法在自然语言处理领域取得了突破性进展,为小说人物网络的构建与分析奠定了坚实的技术基础。本研究以中国现当代小说为对象,探索了为小说构建不同形式人物网络的方法,并以《白鹿原》为例构建了多种人物网络,在此基础上开展了多维度的文学分析。首先,训练了面向汉语小说的人物指代消解模型、说话人抽取模型,以及人物开放式关系抽取模型。其中,人物指代消解模型ARMCC实现了非零形式的人物指称语以及人物零指代的联合消解,用于构建人物共现网络;说话人抽取模型SECN采用经典的预训练+微调方式来为汉语小说中的每条引语抽取相应的说话人,用于构建人物对话网络和人物提及网络;人物开放式关系抽取模型ORECC实现了对单段型关系、多段型关系的动态抽取和对不存在型关系的识别,用于构建人物开放式关系网络。然后,基于以上模型的预测结果为《白鹿原》构建了四种人物网络,并结合实际内容辅以人工干预。每种网络均包含静态和动态两种形式,静态网络涵盖整部小说的信息,用于宏观地考察整体人物框架;动态网络组合了多个时间片的静态网络,用于考察人物随情节发展而产生的变化。最后,以多种形式的人物网络为基础,针对《白鹿原》开展了定量与定性相结合的文学分析。一方面,利用中心度指标和网络表示学习方法,从人物的重要性和相关性,以及人物关系的相似性等角度对人物网络进行描述分析,结合具体的小说内容探讨人物的形象塑造,并验证前人提出的相关文学观点;另一方面,基于动态网络分析人物网络在不同时间片中发生的变化,据此考察小说在情节设计和人物弧光方面的特点。

As one of the most important literary genres, novels have formed a huge scale during long-term development, and the length of novels is usually very long. The literary and applied research on novels faces enormous challenges. Relying only on human reading can hardly meet the research needs. Constructing character networks can transform the complex information contained in unstructured novel texts into a structural form, thereby serving the reading, analysis, and application of novels. It is valuable to research how to construct and analyze character networks of novels. In recent years, deep learning has made a breakthrough in the field of natural language processing, providing a solid technical foundation for the construction and analysis of character networks of novels. Focusing on modern and contemporary Chinese novels, this study explores methods for constructing different types and forms of character networks for novels, and takes White Deer Plain as an example to construct several types of character networks, and then conducts multi-dimensional literary analysis based on the character networks. The main contributions of this study are as follows:Firstly, training an anaphora resolution model, a speaker extracter, and an open relation extractor for fictional characters in Chinese novels. The character anaphora resolution model ARMCC jointly tackles zero pronoun resolution and non-zero coreference resolution for character representations, and can be used to construct character co-occurrence networks. The speaker extracter SECN which is trained under the pre-train and fine-tune paradigm extracts speakers of quotes for Chinese novels, and can be used to construct character dialogue and mention networks. The open relation extracter ORECC dynamically extracts single-span and multi-span relations and detects non-existent relations, and can be used to construct character open-relation networks.Secondly, constructing four types of character networks for White Deer Plain based on the results which are predicted by the above models and adjusted manually according to the content of the novel. Each type of the networks is in both static form and dynamic form. The static networks contain the information of the entire novel and are used to analyze the overall character framework from a macroscopic angle. The dynamic networks combine the static networks of multiple time slices, and are used to analyze how characters change undergo as the story progresses.Finally, conducting quantitative and qualitative literary analysis of White Deer Plain based on various types and forms of character networks. On the one hand, analyzing the characterization and verifing relevant literary viewpoints from the perspectives of the importance of characters, the relevance between characters, as well as the similarity of character relationships, based on the description of the character networks, using centrality measures and network representation learning methods, and the content of the novel. On the other hand, exploring the characteristics of the novel in terms of plot arrangement and character arcs based on analysis of the changes of the dynamic character networks in different time slices.