自杀对于家庭和社会都是一种悲剧。随着社交媒体成为当今人们生活中不可或缺的一部分,基于人的社交媒体行为评估自杀风险引起了越来越多的研究者关注。然而这条路并不平坦,面临很多挑战,例如自杀表达的隐蔽性、用户数据的稀疏性、领域知识的缺乏以及难以推测用户的真实想法等。本文将通过三项研究将上述挑战逐个克服:研究一:基于领域词嵌入和慢性压力特征的社交媒体自杀风险检测方法。借助于新浪微博中的自杀树洞现象,该研究构建了一个包含3652名自杀风险用户以及3677名无自杀风险正常用户的大规模自杀数据集。此外,该研究提出通过构建自杀领域特定的词嵌入,以快速地从微博用户的一系列发文中,找到自杀相关的表达。此外,由于慢性压力是导致人产生自杀想法的重要因素,该研究构建了一个慢性压力检测模型,其检测结果可以传入后续的自杀检测模型,进一步增强其检测性能。最后,该研究设计了一个两层注意力机制来感知用户微博序列中的重点微博,以此来更准确地揭示一个人的内心情感世界。基于以上技术,基于社交媒体的自杀风险检测方法在微博自杀数据集上可以达到91%的准确率。研究二:基于领域个性化知识图谱的社交媒体自杀风险检测方法。该研究提出通过构建自杀领域的用户个性化知识图谱,以解决检测过程中缺乏领域知识的问题。通过借鉴心理学相关研究,该研究考虑了六大类对检测自杀风险有贡献的因素,即个人信息、性格特质、个人经历、情绪表达、发文行为以及社交互动。该研究提供了计算该六类因素特征的方法,并将六类特征通过注意力机制融入到自杀风险检测中。此外为了解决数据稀疏性的问题,该研究通过微博上的互相关注关系找到目标用户的朋友,利用其朋友的信息来丰富目标用户的数据以进行自杀风险检测。基于以上技术,基于社交媒体的自杀风险检测方法在微博自杀数据集上可以进一步达到93%的准确率。研究三:基于用户内在想法与情绪变化的社交媒体自杀风险检测方法。该研究将自杀风险检测分为三个子任务:(1)从用户的公开微博揭示用户内心的真实想法;(2)基于从用户公开微博中提取出的外在情绪变化,推理其隐式情绪变化;(3)结合以上两个子任务推理出的中间结果,检测用户自杀风险。此外该研究还将领域词嵌入和领域知识图谱技术融入其中,该自杀风险检测方法在微博自杀数据集上最终达到了当前最高的95\%准确率。
Suicide is a tragedy for both family and society. With social media becoming an integral part of people's life nowadays, assessing suicidal risk based on one's social media behavior has drawn increasing research attentions. However, this road is not smooth and will face many challenges, such as implicit suicide expression, data sparsity, lack of domain knowledge, and difficulty in sensing the real thoughts of users. This thesis overcomes each of these challenges through three studies:Study 1: Social media-based suicidal risk detection method via suicide-oriented word embeddings and chronic stress features. With the help of the suicide tree hole phenomenon in Sina Weibo, the study constructs a large-scale suicide dataset including 3,652 suicidal risk users and 3,677 normal users without suicidal risk. In addition, the study proposes to quickly find suicide-related expressions from a series of microblogs of Weibo users by constructing suicide-oriented word embeddings. Furthermore, since chronic stress is an important factor causing people to develop suicidal thoughts, the study constructs a chronic stress detection model, whose detection results can be passed into subsequent suicide detection models to further enhance its detection performance. Based on the above technologies, the suicidal risk detection method based on social media can achieve an accuracy of 91% on the Weibo suicide dataset.Study 2: Social media-based suicidal risk detection method via suicide-oriented personal knowledge graph. The study proposes to address the problem of lack of domain knowledge in the detection process by constructing a suicide-oriented personal knowledge graph.Drawing on relevant research in psychology, the study considers six broad categories of factors that contribute to the detection of suicidal risk, namely personal information, personality traits, personal experiences, emotion expression, post behavior and social interactions. The study provides a method to calculate the characteristics of the six categories of factors and integrate the six categories of characteristics into suicidal risk detection through the attention mechanism. In addition, to solve the problem of data sparsity, the study finds the friends of the target user through the follower-following relationship on Weibo and uses the information of their friends to enrich the data of the target user for suicidal risk detection. Based on the above technologies, the suicidal risk detection method based on social media can further achieve an accuracy of 93% on the Weibo suicide dataset.Study 3: Social media-based suicidal risk detection method via users' inner thoughts and emotional changes. The study divides suicidal risk detection task into three sub-tasks: (1) reveal the user's true thoughts from the user's open posts; (2) based on the outer emotional changes extracted from user's open posts, infer their implicit emotional changes; (3) integrate the reasoned intermediate results of the above two subtasks to detect users' suicidal risk. In addition, the study also integrate suicide-oriented word embedding and suicide-oriented personal knowledge graph technology into it. The suicidal risk detection method finally reaches the current highest accuracy rate of 95% on the Weibo suicide dataset.