随着测序技术的发展,RNA聚合酶、新生mRNA以及活跃转录相关的组蛋白修饰信号在全基因组上实现了定位和定量,异向转录现象也被发现广泛存在于真核生物基因组上。此前研究对异向转录的定义不明确,仅有小部分具备活跃转录和染色质开放的特征,同时异向转录的研究通常集中在少量的人类细胞系或者动植物基因组,异向转录的调控机制和功能分析也局限于个别的异向转录基因而缺乏系统性的研究。本研究对异向转录进行重新定义,由一对空间邻近的反向活跃转录起始信号构成并且两者位于同一段染色体开放区域。通过结合新生RNA测序、H3K4me3测序以及DNaseI超敏感位点测序数据,设计全新的异向转录鉴定流程对多种人类细胞系的异向转录进行从头鉴定并进行注释。本研究发现大量特异性的异向转录,包括未注释的新生转录起始,不同注释状态的异向转录在多种类型数据中的表达丰度也存在一定区别。其次,本研究首次比较异向转录在不同细胞体系间的差异性,发现异向转录具有较强的细胞特异性,共有/特异性异向转录在序列保守性和执行生物学功能上均存在区别。有趣的是,尽管本研究识别的异向转录(位于同一段染色体开放区域)在细胞系之间的相似度较低,其相似性系数却显著高于相互独立的异向转录(基因间的染色质存在核小体包裹),暗示异向转录倾向于在不同细胞系间成对出现。进一步,本研究探索异向转录在非小细胞肺癌(NSCLC)中表达模式、生物学意义和调控关系,部分异向转录可以通过多种互作模式指示NSCLC患者的预后,调控元件可以促进/抑制的方式影响异向转录之间的相关性。由于异向转录中包含很多lncRNA,更有部分异向转录由lncRNA/mRNA构成,因此我们进一步系统性地研究lncRNA在多种癌症中与靶基因的调控关系,探索lncRNA在疾病进程中的调控作用。综上所述,本研究提出一种新型策略来精准地从头鉴定多种人类细胞系中的异向转录,并且为后续的研究提供了丰富的异向转录资源,以及它们的相关特征。同时,本研究系统性地鉴定调控因子-异向转录的调控关系和lncRNA-靶基因的转录调控关系,为我们更好地理解异向转录的调控机制以及疾病进程中lncRNA的生物学功能奠定了坚实的基础。
With the development of sequencing technology and the achievement of the localization and quantification of RNA polymerase, nascent mRNA and active transcription-associated histone modification signals on the genome-wide, divergent transcription are frequently observed in eukaryotic cells. In previous studies, the definition of divergent transcription remains unclear and only a small fraction are characterized by actively transcribing and open chromatin region. Divergent transcription researches mostly focus on the limited numbers of human cell lines or plant or animal genomes, while some researches on regulatory mechanism and functional analyses of divergent transcription are restricted in individual divergent transcription genome, which lack systematic analysis.In the present work, we redefined the divergent transcription that consisted of two nascent transcriptional start events at reverse strands from closely spaced accompanied by a shared open chromatin region. We developed a novel pipeline to precisely de novo identify the divergent transcription events in 12 types of cells, by combination with the GRO, H3K4me3, and DNaseI-seq data. Compared with previous work, we found a large amount of specific divergent transcription events with a portion of both unannotated TSSs and the expression abundance of divergent transcription events differed in multi-omics data according to their annotated status. With the divergent transcription pairs exhibited highly specificity and compared with common pairs, the cell specific pairs showed weaker conservation and performed more cell related functions. Interestingly, though being low among the cell lines, the similarity of divergent transcription pairs identified by our work was significantly higher than independently divergent transcription events with separated open chromatin regions, suggesting that divergent transcription pairs tend to co-occur among different cell lines. Further, we explored the expression patterns, biological insight and regulating relationships of divergent transcription pairs in NSCLCs and found that some of the divergent transcription pairs could cooperate with each other as prognostic markers in NSCLC and transcriptional regulatory elements could impact the correlation between divergent pairs through promoted mode or inhibited mode. In addition, there were a lot of lncRNAs participated in divergent pairs, thus we systematically identified lncRNAs which play essential roles in regulating the transcription of target genes in multiple cancers, giving us a better understanding of biological functions of lncRNAs.In summary, we proposed a new strategy to precisely identify the divergent transcription events in multi human cell lines. Our results therefore served as a new resource of the divergent transcription pairs, together with their physiological relevance for follow-up study. Furthermore, the systematically identified regulation patterns of modulator-divergent transcription pairs and lncRNA-mRNA helped us to have a deep insight into the regulation mechanism of divergent transcription pairs and the function of lncRNAs.