登录 EN

添加临时用户

酿酒酵母基因回路大规模平行构建-表征及机器学习优化

Massively Parallel Construction, Characterization and Machine Learning-Assisted Combinatorial Optimization of Genetic Circuits in Saccharomyces cerevi

作者:周益康
  • 学号
    2015******
  • 学位
    博士
  • 电子邮箱
    188******com
  • 答辩日期
    2020.07.17
  • 导师
    张翀
  • 学科名
    化学工程与技术
  • 页码
    139
  • 保密级别
    公开
  • 培养单位
    034 化工系
  • 中文关键词
    基因回路,标准化组装,机器学习,异源代谢途径,生物传感器
  • 英文关键词
    Genetic circuits, Standardized assembly, Machine learning, Heterogeneous synthetic pathway, Metabolite biosensor

摘要

代谢途径、调控网络等基因回路在微生物细胞工厂构建中发挥重要作用,为了使基因回路发挥预期的功能,通常采用组合优化策略,通过多轮迭代设计、构建和测试,搜索基因回路组合空间全局最优点。近年来,合成生物技术的快速发展使得构建组合优化文库的能力大幅提升,但是表型测试能力仍旧存在瓶颈。本论文针对“如何大规模并行测试基因回路基因型-表型关联”以及“如何利用基因型-表型关联数据指导基因回路的设计”等关键科学问题,以酿酒酵母中异源代谢途径和转录因子型代谢物生物传感器基因回路为例,开发基因回路大规模平行构建-表征的标准流程,并尝试应用机器学习辅助的优化策略,寻找基因回路组合空间中的最优解。在异源代谢途径基因回路优化问题中,在YeastFab标准化组装策略的基础上,应用ANN Ensemble算法预测优势组合,开发了结合机器学习辅助寻找代谢途径组合空间最优解的工作流程,成功优化了酿酒酵母中β-胡萝卜素和紫色杆菌素异源合成途径目的产品产量,实现了紫色杆菌素产量和纯度的双目标优化,获取了能够高效合成高纯度紫色杆菌素的菌株。在代谢物生物传感器基因回路优化问题中,为了实现转录因子的表达量、识别序列的插入位点、输出报告基因的启动子强度等变量的组合优化,建立了可追踪生物传感器基因回路组合文库标准化拼装流程,将组合信息高保真的浓缩在总长度在二代测序读长以内的分子条形码中。以基于FapR-fapO系统的丙二酰辅酶A生物传感器为模式体系,大规模平行构建了包含6种转录因子表达水平、4种操纵子插入位点和216种上游增强序列,共计5184种不同组合的可追踪文库。应用FACS-seq联用技术大规模并行测试了其中2632个生物传感器的在6种效应物浓度下的荧光输出,实现了大规模基因型-表型数据的高效获取。应用机器学习算法成功对组合空间中剩余组合的荧光输出进行预测,实现了整个传感器基因回路组合空间的全景扫描。基于上述策略获取了文献报道最大动力学响应范围的丙二酰辅酶A生物传感器。本论文基于迅速发展的核酸标准化组装技术,通过大规模并行测试获取基因型-表型关联数据,从而在传统的设计-构建-测试循环添加学习环节,为解决酿酒酵母中复杂基因回路的组合优化问题提供了新的思路。

With the rapid development of synthetic biology, well-designed genetic circuits, such as metabolic pathways or regulatory networks, are playing important roles in the construction of highly efficient microbial cell factories. To optimize the function of constructed genetic circuits in the working horse, combinatorial optimizations are usually carried out with iterative design-build-test cycles to obtain the optimal solution in the combinatorial space. The rapid development of DNA synthesis and assembly technologies have boosted the capabilities for the construction of combinatorial genetic circuits, the ability for the phenotypic testing, however, is still largely lag behind. In this thesis, we focused on the combinatorial optimization of two typical genetic circuits in Saccharomyces cerevisiae: heterologous synthetic pathways and transcription factor (TF) based-metabolite biosensors. In order to bridge the gap between genotype construction and phenotype testing, and to generate large-scale genotype-phenotype association data to guide the design of genetic circuits, we developed standard workflow for massively parallel construction and characterization of aimed gene circuits, and applied machine learning algorithms to find the optimal solution in the combination space. We firstly developed an efficient machine-learning workflow in conjunction with YeastFab assembly strategy (MiYA) for combinatorial optimization of heterologous metabolic pathways in Saccharomyces cerevisiae, which applied standard pathway assembly and ANN Ensemble algorithms to determine the priority of all possible designs in combinatorial heterologous pathways. MiYA has showed powerful efficiency in the optimization of not only the titer of desired products (β-carotene and violacein) titer, but also the purity and titer at the same time, which helps to obtain a strain producing highest purity violacein (purity> 99%) reported so far.As for the optimization of the metabolite biosensors, the promoters driven the expression of TF, operator designs, and reporter promoter strength all significantly affect the response performance, we thus chose the malonyl-CoA biosensor in S. cerevisiae as a model system, and developed a trackable assembly workflow to construct highly ordered combinations with barcodes, which could condense the genotypic information of multiplex regulated biosensor in a short DNA barcode. We then applied FACS-seq technique to achieve massively parallel characterization of these trackable combinatorial library. The trackable combinatorial library contained 5184 combinations with 6 levels of TF dosage, 4 different operator positions, and 216 possible UAS designs. By using FACS-seq technique, the response curve of 2,632 biosensors out of 5184 combinations were successfully characterized, providing large-scale genotype-phenotype association data of the designed biosensors. Finally, machine learning algorithms were applied to predict the genotype-phenotype relationships of the uncharacterized combinations, generating panoramic scanning map of the combinatorial space. With the assistance of our novel workflow, a malonyl-CoA biosensor with the largest dynamic response range reported so far was successfully obtained.With the input of large-scale genotype-phenotype association data, we added learning step into traditional design-build-test cycle, which provided a new insight for the combinatorial optimization of complex genetic circuits in S. cerevisiae.