近年来,以神经网络为主要结构的深度学习模型在诸多机器学习任务中取得了突破性的进展。深度学习模型往往以端到端学习的方式自动学习层次化的特征表示。一方面,深度学习模型学习的特征对于特定的任务具有较好的判别性和鲁 棒性;另一方面,相比于传统方法,深度学习模型有着更多的参数,更复杂的结构并且需要更多的训练数据。深层神经网络是深度学习的主流模型。深层神经网络的结构,包括模型的深度和宽度,是影响模型性能的重要因素。然而,长久以来,深层神经网络的结构主要依据人工经验进行调整。人工设计结构一方面不利于技术的推广和结果的复现;另一方面,一旦数据或问题改变,大量人力又被需要用于重新探索合适的模型结构。作为一种重要的超参数,深层神经网络的深度、各隐层的宽度、节点的连接方式等的优化属于离散优化问题。该优化问题十分重要,然而相关的研究相对欠缺。因此,本文专门针对深层神经网络的结构,提出了一套科学有效的优化算法。实际应用的结果展现了优化后的卷积神经网络和递归神经网络相比于优化前的性能有着显著提升。本文的主要贡献有: (1)提出了一种基于子模性和超模性的神经网络结构优化算法。本文创新地提出了结构编码,函数变换等技术,将结构优化转化为子集选择优化问题。同时,提出了有约束和无约束的优化算法,并给出了理论分析和证明。?(2)提出了基于高斯过程和子模超模上下界的替代函数,从而加速了优化算法。?(3)对卷积神经网络结构进行优化,在交通标识识别问题中取得了最佳的成绩。?对递归神经网络结构进行优化,在图像描述问题中取得了最佳的成绩。?
Recently, deep learning models achieve breakthroughs in many machine learning fields. Deep models automatically learn hierarchical features using end-to-end learning. On one hand, features learnt by deep models are more discriminative and robust; on the other hand, deep models have more parameters, more complex structures and need more training data. Deep neural networks’ structure includes depth and width being key factors influencing the models’ performance. However, deep neural networks structure is mainly tuned based on experts’ experience. On one hand, it is bad for reproducing the results; on the other hand, once data or task is changed, tuning the models again costs much time. Deep neural networks depths, hidden layers’ widths, units connection methods involve discrete optimization problem which is important, however, there’re few relative research results. Thus, this thesis proposes a framework to optimize deep neural networks structure. In real applications, the optimized convolutional neural networks and recurrent neural networks illustrate obvious improvement in performance. The main contributions of this thesis are: (1) Propose deep neural networks structure optimization using submodularity and supermodularity. Propose novel structure encoding, function transformation techniques, and convert this problem to a subset selection problem. Suggest constrained and unconstrained algorithms to solve the problem with theoretical analysis and proofs. ?(2) Propose Gaussian process and submodular or supermodular bounds as surrogate functions to speed up the optimization. ?(3) Optimized convolutional neural networks achieve state-of-the-art on traffic sign recognition task. Optimized recurrent neural networks achieve state-of-the-art on image caption task. ?