计算机辅助药物设计技术可以大大缩短新药的开发周期,降低开发成本。随着人工智能技术的发展,深度学习有望为计算机辅助药物设计带来新的驱动力。当前这一方向的核心目标是开发新算法、提高预测精度。图神经网络作为一种新兴深度学习技术,在处理图数据方面表现出色,适用于小分子图结构的建模表征。本文基于图神经网络,充分探究了其在分子图表征学习及其在计算机辅助药物设计中的分子性质预测、药物-靶点相互作用、药物-药物相互作用以及分子设计优化等方向的应用,以提高药物研发相关任务的预测准确率。 本文首先分析了现有图神经网络在分子性质预测方向的问题,针对性地提出一种新型的图神经网络结构TrimNet,加强了对分子化学键信息的挖掘提取,探究了层标准化、残差连接等多种模块对于图网络性能的影响,并将TrimNet应用在分子量子化学性质、生物活性和生理性质的端到端预测任务上,结果显示该模型与现有模型相比,在大大减少了参数量的情况下取得了更高的预测准确率。 针对监督学习对标注数据的依赖问题,本文随后探究了自监督学习在分子表征的应用。本文提出一种分子图级别的自监督策略---成对半图判别(PHD)用于在无标注分子数据上预训练图神经网络。实验结果表明PHD优于现有的图自监督策略,并在多种常见图神经网络模型上都表现有效。接着,我们设计了一种深度图网络模型(MolGNet),并利用上述自监督策略对MolGNet在大规模未标注分子上进行预训练,我们发现预训练后的MolGNet能够学习编码一些基本的化学常识,包括分子有效性、骨架信息和量子信息。然后,我们将MolGNet迁移到多种计算机辅助药物设计任务上,包括分子性质预测、药物-药物相互作用和药物-靶点相互作用等任务,模型在十二个药物相关的数据集上刷新了当前的准确率。 最后,针对分子优化设计任务,本文提出了融合MolGNet的模拟退火算法(SAMO)。在对分子亲疏水性质优化任务上,SAMO优于当前最好的方法,在相同的分子相似度约束条件下,可以优化设计出具有更理想性质的分子。 综上,本文以图神经网络为核心技术,针对分子图结构构建了有效的图网络模型变体,并采用不同的监督学习和自监督学习策略,在多项计算机辅助药物设计任务上提升了现有算法的预测精度,为药物研发提供了新思路新算法,推动了人工智能技术在药物研发领域的发展与应用。
Computer-aided drug design technology can greatly shorten the development cycle of new drugs and reduce development cost. With the development of artificial intelligence technology, deep learning is expected to bring new driving force to computer-aided drug design. Currently, the core goal of this direction is to develop new algorithms and improve prediction accuracy. As an emerging deep learning technology, graph neural networks perform well in processing graph data, and are suitable for modeling and representing the structure of small molecular graphs. Based on graph neural networks, this paper fully explores its application in molecular graph representation learning and molecular property prediction, drug-target interaction, drug-drug interaction, and molecular design optimization in computer-aided drug design, to improve the prediction accuracy of tasks related to drug development. This thesis first analyzes the problem of the existing graph neural networks in the area of molecular properties prediction, and proposes a new type of graph neural network structure TrimNet, which strengthens the mining and extraction of molecular chemical bond information, and explores the impact of various modules, such as layer standardization, residual connection, on graph networks’ performance. TrimNet is applied to the end-to-end prediction tasks of molecular Quantum chemical properties, biological activity and physiological properties. The results show that compared with existing models, this model achieves a higher prediction accuracy rate while greatly reduces the number of parameters. Aiming at the problem of supervised learning’s dependence on labeled data, this paper then explores the application of self-supervised learning in molecular representation. This paper proposes a molecular graph-level self-supervised strategy—Pairwised Half-Graph Discrimination (PHD), which is used to pretrain graph neural networks on unlabeled molecular data. The experimental results show that PHD is superior to the existing graph self-supervised strategies, and is effective on a variety of common graph neural network models. Next, we designed a deep graph network model (MolGNet), and used the above self-supervised strategy to pretrain MolGNet on large-scale unlabeled molecules. We find that the pretrained MolGNet can learn to code some basic chemical knowledges, including molecular validity, skeletal information and quantum information. Then, we transfer MolGNet to a variety of computer-aided drug design tasks, including molecular property prediction, drug-drug interactions, and drug-target interactions. The model outperforms the existing models on twelve drug-related data sets with higher prediction accuracy. Finally, for the task of molecular optimization design, this paper proposes a simulated annealing algorithm (SAMO) fused with MolGNet. In the task of optimizing the hydrophilic and hydrophobic properties of molecules, SAMO is superior to the current best methods. Under the same molecular similarity constraints, it can optimize the design of molecules with more ideal properties. In summary, this paper uses graph networks as the core technology, constructs effective graph network model variants specifically for molecular graph structure, and adopts different supervised learning and self-supervised learning strategies. The methods improve the precision of existing algorithms in multiple computer-aided drug design tasks. This paper provides new ideas and new algorithms for drug research and development, and promotes the development and application of artificial intelligence technology in the field of drug research and development.