细胞分化是最基本的生物学过程之一。干细胞具有自我更新能力,并通过分化产生各种功能性细胞,对于整个机体的正常运转至关重要。细胞分化网络的重建和命运决定特征的识别,一直以来都是重要的研究议题。现有的相关计算方法,主要是基于相似性的分析,倾向于挖掘不同细胞状态的相关性。而事实上,细胞分化过程本质上是受因果关系驱动的。本研究希望利用不同类型的数据去识别各细胞类型之间的因果关系来重建细胞分化网络。本论文克服目前的贝叶斯网络结构学习算法的不足,提出一种改进的混合策略,开发了细胞命运因果推断框架CIBER,能够在基于少量先验知识的情况下得到稳健、预期的网络结构,并识别可能的潜在分化路径。之后,本研究结合学到的网络和结构因果模型,利用计算机干扰实验,构建了特征效应矩阵来度量不同特征对各分化分支的影响。基于特征效应矩阵,CIBER提出全新的不同于常规差异特征(regular differential feature,RDF)分析的分化驱动特征(differentiation driver feature,DDF)分析方法。DDF分析能够捕获到与RDF分析不同的信息,为识别影响细胞分化命运的关键特征提供新的思路。从图的视角看,与RDF分析所挖掘的“点”的信息不同,DDF分析是在挖掘“边”的信息,两者可以通过互补来更全面地反映整个细胞分化过程。DDF分析可以识别影响造血分化的、无统计差异的关键特征,并以转录因子Bcl11b为例通过小鼠体内实验进行了验证。此外,本研究提出了一种新的蕴含因果信息的伪时间估计算法。在多套数据集上的应用结果表明,与现有方法相比,CIBER能够估计出合理且更平滑的伪时间信息。CIBER是一个适用性广泛的细胞命运分析因果推断框架,可以在不同类型的数据上发挥作用,不论是连续状态、非负的数据类型,还是离散状态、实数化的数据类型,如单细胞或大量细胞RNA、ATAC测序以及微阵列数据。综上所述,本研究开发了稳健的细胞命运因果推断框架CIBER,可以基于少量先验知识重建细胞分化网络;构建了特征效应矩阵来度量不同特征对各分化分支的影响;提出了新的关键特征识别算法,可以识别无统计差异的命运决定特征,并能够与现有的差异分析方法进行互补,反映生物过程更全面的信息;并进一步提出了新的伪时间估计方法,可以得到合理且更平滑的细胞分化趋势。
Cellular differentiation is among the most fundamental biological processes. Stem cells can self-renew and differentiate into various functional cells that are essential for the proper functioning of the entire body. Therefore, the reconstruction of cellular differentiation networks and the identification of fate-determining features have always been important research topics. Most of the existing methods associated to cell fate analysis are similarity-based and tend to identify correlations between different cell states, while the process of cellular differentiation is driven by causality.This study aims to use different kinds of data to reconstruct cellular differentiation networks by identifying causality between various cell types. Based on existing Bayesian network structure learning algorithms, we introduced an improved hybrid strategy to develop CIBER, a causal inference–based framework for cell fate analysis which can obtain robust and feasible structures with minor prior knowledge and can identify potential differentiation branches. By combining the learned network with structural causal model and applying in silico perturbation, we construct a feature-effect matrix which quantifies the impacts of different features on each branch of cell differentiation. Based on the effect matrix, CIBER subsequently uses differentiation driver feature (DDF) analysis to identify cell fate-determining features. DDF analysis can capture different information from regular differential feature (RDF) analysis, providing new insights into the identification of features affecting the cellular differentiation process. From the perspective of graph theory, unlike “node” information mined by RDF analysis, DDF analysis is mining “edge” information, and the two can complement each other to reflect the entire cellular differentiation process comprehensively. We demonstrate that DDF analysis can identify features crucial to haemotopoiesis that show no significant difference between lineages, which we further validated with the transcription factor Bcl11b through in vivo experiments. Furthermore, CIBER provides a novel single-cell pseudotime estimation method based on causal information. Applications on several datasets show that CIBER obtain smoother pseudotime trends than existing trajectory inference methods. CIBER is a widely applicable causal inference framework for cell fate analysis, which can work on different types of data, continuous or discrete, non-negative or negative, such as single-cell/bulk RNA/ATAC-seq data, microarray, etc.In summary, in this study we developed CIBER, a causal inference framework for cell fate analysis which can reconstruct robust cellular differentiation networks with minor prior knowledge; built a feature-effect matrix to quantify the impact of a feature on each differentiation branch; introduced differentiation driver feature analysis which can identify important features showing no significant difference between lineages, and be complemented with regular differential feature analysis methods to reflect the information of biological processes comprehensively; and provided a new pseudotime estimation method that can obtain smoother cell differentiation trends than existing methods.