登录 EN

添加临时用户

基于规则表征学习的可解释建模研究

Rule-based Representation Learning for Interpretable Data Modeling

作者:王焯
  • 学号
    2018******
  • 学位
    博士
  • 电子邮箱
    wan******com
  • 答辩日期
    2023.05.15
  • 导师
    王建勇
  • 学科名
    计算机科学与技术
  • 页码
    113
  • 保密级别
    公开
  • 培养单位
    024 计算机系
  • 中文关键词
    可解释性,规则模型,表征学习,可扩展性,神经符号学习
  • 英文关键词
    Interpretability,Rule-based Model,Representation Learning,Scalability,Neuro-symbolic Learning

摘要

规则模型依靠其透明的内部结构和良好的模型表达能力,在医疗、金融和政治等对模型可解释性要求较高的领域发挥着重要作用。然而,传统的规则模型存在优化困难、可扩展性差等问题,严重限制了其应用范围。而集成模型、软规则和模糊规则等,虽然提升了分类预测能力,但牺牲了模型可解释性。神经符号学习模型虽然可以通过神经网络进行规则学习,但仍存在如何高效进行规则表征、如何提升可扩展性,以及如何在连续值空间中搜索离散值解等问题亟待解决。针对模型可解释性以及规则模型所面临的问题,本文的主要研究内容和贡献如下:1. 基于离散规则的可解释数据表征学习。为了同时满足模型预测能力和可解释性方面的需求,本文提出了一种能够基于命题逻辑规则进行数据表征的透明模型,概念规则集(CRS)。为了高效求解不可导的CRS,本文提出了多层逻辑感知机(MLLP)以及随机二值化(RB)训练法。12个公开数据集上的实验结果表明,CRS的性能与复杂“黑盒”模型相当,而模型复杂度与决策树等简单模型相近。2. 面向大规模数据的可扩展离散规则表征学习。虽然CRS实现了高效的规则表征学习,但依旧存在可扩展性差、难以处理大规模数据的问题。为此,本文提出了规则表征学习器(RRL)来学习可解释的离散规则用于数据表征。为了提升RRL的可扩展性并使高效优化大规模RRL成为可能,本文提出了改进版逻辑激活函数以及用于直接优化离散RRL的梯度嫁接法。实验表明RRL的分类性能和可扩展性显著优于其他可解释方法,并能与不可解释的复杂模型取得相近的结果。3. 计算解耦的深层规则表征学习。虽然RRL获得了处理大规模数据的能力,但其仍然存在计算耦合以及难以处理深层结构等问题。为了解决上述问题,本文首先提出了基于矩阵乘法的新型逻辑激活函数,成功实现了计算解耦。针对深层RRL的训练,本文还提出了层级梯度嫁接法。实验表明新型逻辑激活函数运算速度更快且资源占用更低,而层级梯度嫁接法能有效训练深层RRL。4. 基于规则表征学习的痴呆症辅助诊断。医疗辅助诊断系统需要可解释性来保障病人安全以及支持医生决策。本文以痴呆症辅助诊断为例,设计了基于RRL生成可解释诊断规则的框架。该框架首先会生成不同复杂度和诊断效果的候选规则集合,然后由医生基于可视化和测试结果进行分析、权衡和挑选。实验表明基于RRL的痴呆症辅助诊断不仅有良好的准确率和可解释性,而且利于推广和部署。

Rule-based models, benefiting from their transparent inner structures and good model expressivity, play an important role in domains demanding high model interpretability, such as medicine, finance, and politics. However, conventional rule-based models are hard to optimize, especially on large data sets, due to their discrete parameters and structures, which limit their application scope. Ensemble methods and fuzzy/soft rules are commonly used to improve performance, but they sacrifice the model interpretability. Although the neuro-symbolic learning model can learn rules through neural networks, there are still problems that need to be solved urgently, such as how to efficiently learn rule-based representations, how to improve scalability, and how to search for discrete solutions in continuous spaces. The main contributions of this dissertation are as follows:1. Interpretable data representation learning based on discrete rules. To meet the requirements of both model performance and interpretability, we propose a new hierarchical rule-based model for classification tasks, named Concept Rule Sets (CRS), which learns transparent data representation using logical rules. To address the challenge of efficiently learning the non-differentiable CRS model, we propose the Multilayer Logical Perceptron (MLLP) and the Random Binarization (RB) training method. Experiments on 12 public data sets show that the performance of CRS is close to those "black-box" models, while the complexity of the learned CRS is close to the simple decision tree.2. Scalable discrete rule-based representation learning for large-scale data. Although CRS realizes efficient rule-based representation learning, it still suffers from poor scalability and difficulty in handling large-scale data. To solve the above problems, we propose a new classifier, named Rule-based Representation Learner (RRL), that automatically learns interpretable non-fuzzy rules for data representation and classification. To improve the scalability of RRL and efficiently optimize large-scale RRL, we propose improved logical activation functions and a novel gradient-based discrete model training method, Gradient Grafting, that directly optimizes the discrete model. Exhaustive experiments show that RRL outperforms the competitive interpretable approaches and can achieve comparable results with complex models. 3. Deep rule-based representation learning with decoupled computation. Although RRL has the ability to handle large-scale data, it still has problems such as coupled computation and difficulty in dealing with deep structures. To solve the above problems, we first propose novel logical activation functions based on matrix multiplication, which eliminates coupled computation. For the training of deep RRL, we also propose a hierarchical gradient grafting method. Experiments show that the novel logical activation functions are faster and take fewer resources, and the hierarchical gradient grafting method can effectively train deep RRL.4. Rule-based representation learning for diagnosis of dementia. Medical-aided diagnostic systems need model interpretability to ensure patient safety and support doctors‘ decision-making. We design a framework adopting RRL to build interpretable diagnostic rules for dementia. This framework first generates candidate diagnostic rules with different classification performances and model complexities. Then, according to the visualization and test results of these candidate rules, doctors can easily select the final rules after analysis and trade-off. Experiments show that the diagnosis of dementia based on RRL not only has good performance and interpretability but also is easy to promote and deploy.