随着移动互联网的普及和社交媒体的兴起,Facebook、Twitter、微信、新微博等社交媒体平台持续生产了用户行为和社交互动的海量数据,提供了一种计算识别用户人格的新方法。这种基于社交媒体的人格预测,不仅拓展了计算心理的新方向,而且对数字营销、个性化推荐、心理诊断和人力资源管理等实际应用具有持续的商业影响。已有基于社交媒体的人格计算研究多利用文本信息进行人格预测,但较少研究社交媒体多模态信息与人格的关系。人格特征与用户个人头像的选择、发布朋友圈的内容、表情符号的使用有什么关系?利用社交媒体多模态信息是否能提升人格预测效果?用户在多个社交媒体平台上的信息对人格预测有什么影响?这些问题亟待探索。本文通过两个案例研究,开展基于微信和微博多模态信息的社交媒体人格预测,尝试回答这些问题。本研究收集了389个社交媒体用户的微信和新浪微博信息,并通过大五人格量表问卷计算人格特质;通过堆栈泛化的集成方法整合包括文本、头像、表情符号等用户的社交媒体交互信息,进而预测用户大五人格特质。研究比较了文本、头像、表情符号等不同社交媒体交互行为对于人格的预测效果,以及微信和微博两个社交媒体平台对人格的预测能力。研究结果表明,用户头像比表情符号、文本更有助于提升人格预测准确性;微信对于大五人格中的开放性、宜人性、外倾性和尽责性的预测效果更好,而微博对于情绪稳定性的预测效果更好;相比利用单种交互信息和单个社交媒体平台数据,利用社交媒体多模态信息、多个社交媒体平台数据均可以提高人格预测效果。所有实验均通过随机划分数据集重复10次,结果显示集成模型在0.05水平显著优秀。五个人格特质采用极端划分进行二分类的识别平均AUC为0.80,F1值为0.74,精度为0.80,召回率为0.71;五个人格特质采用平均划分进行二分类的识别平均AUC为0.77,F1值为0.76,精度为0.72,召回率为0.79。
Social media platforms such as WeChat, Sina Weibo, Facebook, LinkedIn and Twitter, which generate a huge massive of data every second, emerged dramatically within the past few years. Those data contain rich information about human behavior and their social interactions which provides a new way to identify personality traits of a person. A comprehensive understanding of user personality is not only essential to many scientific subjects, but also has a far-reaching business impact on real-life applications such as online marketing, personalized recommendation, mental diagnosis, and human resources management.Most social media platforms provide users varies ways to interact with others: by posting text content, chatting with emoji, building a user profile, selecting avatars. Previous studies have demonstrated that language usage in social media is effective for automatic personality prediction. However, except for single language features, research about how to leverage the cross-interaction heterogeneous information on social media to have a better understanding of user personality a less researched direction. How language use differs across platforms? Can we utilize information from multiple social media platforms to improve personality prediction performance? The answer to these questions is less well understood. In this paper, we carried out a cross-interaction and multiple social media platforms personality prediction research containing two real case study trying to answer these questions. We collected the WeChat and Weibo data of 389 users and got their Big Five personality traits with BFI-44. We applied stacked generalization-based ensemble method integrating heterogeneous information including self-language usage, avatar, emoji, and usage frequency to predict users' personality traits. The performance of various social media interactions in personality prediction is compared. We also compared the predictive power for personality of two social media platform using a data set of users who are active on both WeChat and Weibo. The results reveal that avatar features outperform other features for all personality traits. Moments of WeChat and blogs of Weibo are not good predictors for personality. WeChat makes better prediction for Openness, Agreeableness, Extraversion, and Conscientiousness and Weibo makes better prediction for Neuroticism. And compared with making personality prediction with single type of social media interaction or single social media platform data, making prediction with cross-interaction heterogeneous information or multiple social media platforms data based on ensemble method can improve personality prediction performance. All the experiments are done repeatedly with randomized dataset splits for 10 times and the significance of the ensemble models is verified at p < .05 level. Among all the models, ensemble model performed best with 0.80 AUC, 0.74 F1-score, 0.80 precision and 0.71 recall for extreme split classification and 0.77 AUC, 0.76 F1-score, 0.72 precision and 0.79 recall for mean split classification on average.