近年来,在大数据、计算能力和机器学习算法的支持下,人工智能得到飞速发展。其中,机器学习算法是人工智能的核心。本文从两类主流的机器学习算法中选取两个代表性的算法进行研究,即集成学习领域的随机森林算法和深度学习领域的深度神经网络算法。针对这两类算法存在的局限和不足,本文从理论和应用等多方面对其进行深入研究并提出相应的解决方案。主要工作概括如下:(1)针对随机森林中决策树对数据适应性差的问题,本文从Tsallis熵的角度优化决策树的分裂准则和建树方法。首先提出一种统一的Tsallis分裂准则一统现有的决策树算法,在此基础上,又进一步提出一种对称的分裂准则及最大相关最小冗余的建树方法,降低决策树建树的贪婪性,提升决策树对数据的适应性。(2)针对随机森林中随机特征子空间选择机制在高维数据下面临的子空间内信息性特征不足的问题,本文提出一种特征变换和分层抽样相结合的特征子空间选取方法。首先,针对随机森林在高维数据下性能不佳的原因给出理论分析,其次,提出基于特征变换和分层抽样的随机森林算法,保证每个决策树节点都包含足够多的信息性特征进行学习。该随机森林算法在无论低维还是高维数据下的泛化性能都很好。(3)针对随机森林目前面临的理论性质与实验性能之间的两难性问题,即实验性能很好的随机森林其理论性质没有保证,而理论性质有保证的随机森林其实验性能又不好,本文提出一种伯努利可控的随机森林算法,其利用两个伯努利分布来帮助选择每个节点使用的分裂属性及分裂点,即以一定的概率使用随机过程或确定性过程来构建随机森林中的决策树。本文提出的伯努利随机森林不仅具有可以被证明的一致性还具有良好的实验性能。(4)针对深度神经网络在噪声标签数据集上性能不佳的问题,本文从子空间维度的角度解释深度神经网络的学习过程。在噪声标签数据集下,神经网络遵循两阶段学习模式:1)早期维度压缩阶段,学习与真实数据分布紧密相关的低维子空间;以及2)后期维度扩展阶段,逐渐增加子空间维度,以适应噪声标签。基于这一发现,本文提出一种维度驱动学习的训练策略,通过调整损失函数,避免深度神经网络学习中的维度扩展阶段。(5)针对深度神经网络在复杂噪声标签下的鲁棒训练问题,本文提出一种迭代学习框架,通过迭代式噪声检测、判别性特征学习和重新加权三个模块相互作用相互提高,学到一种准确的数据表示,不仅可以把类别分开,还能把噪声样本和干净样本分开。该方法不依赖噪声模型从而适用
The rapid development of artificial intelligence cannot be separated from the support of big data, computing power and machine learning algorithms. This dissertation focuses on machine learning algorithms in artificial intelligence. Specifically, this dissertation selects two very representative algorithms from two popular types of machine learning algorithms, namely, random forest algorithm of ensemble learning and deep neural network algorithm of deep learning. This dissertation addresses the limitations and deficiencies of these two types of algorithms, and conducts in-depth research on theoretical guarantees and practical applications, and proposes corresponding solutions. The main contributions are summarized as follows: (1) To address the problem of greediness and lack of adaptability in decision trees of random forests, this dissertation optimizes the splitting criteria and the method of tree construction from the perspective of Tsallis entropy. First of all, this dissertation proposes a unified Tsallis splitting criterion that unifies the existing decision tree algorithms. Based on the above Tsallis splitting criterion, this dissertation further proposes a symmetric two-term splitting criterion and a maximal-orthogonality-maximal-relevance method for tree construction, which reduces the greediness and improves the adaptability of decision trees. (2) For high dimensional data, the performance of random forests degenerates because of the random sampling feature subspace for each node in the construction of decision trees. To address the issue, this dissertation proposes a combination of feature transformation and stratified sampling method for feature subspace selection in random forests. First, this dissertation theoretically analyzes the reasons for the poor performance of random forests under high dimensional data. Second, this dissertation proposes a random forest algorithm based on feature transformation and stratified sampling. The random forest algorithm proposed in this dissertation has good generalization performance under various data, regardless of low dimensional or high dimensional data. (3) To address the dilemma of the theoretical and experimental performance of random forests, i.e., random forests with good experimental performance do not have guaranteed theoretical properties and random forests with guaranteed theoretical properties often perform poorly, this dissertation proposes a Bernoulli-controlled random forest algorithm that uses two Bernoulli distributions to help determine the splitting attribute and splitting points used by each node. Specifically, it uses a random process or a deterministic process to construct a decision tree in random forests with a certain probability. Bernoulli random forest algorithm proposed in this dissertation not only has the proved consistency but also has good experimental performance. (4) To address the issue of lacking the interpretability of their performance degradation on data sets with noisy labels for deep neural networks, this dissertation explains the learning process of deep neural network from the perspective of subspace dimensionality. This dissertation uses a dimension metric called local intrinsic dimensionality to analyze the representation subspace of the training sample. This dissertation shows that under the data set with noisy labels, deep neural networks follow a two-stage learning: 1) the early dimension compression stage, which simulates the low dimensional subspace closely matching the real data distribution, and 2) the later dimension expansion stage, which gradually increases the subspace dimension to accommodate noisy labels. Based on this finding, we propose a new training strategy called dimensional-driven learning. By adjusting the loss function, we avoid the dimension expansion stage of learning in deep neural networks. (5) For the problem of how to robustly train deep neural networks under complex noisy labels, this dissertation proposes an iterative learning framework that uses iterative noisy label detection, discriminative feature learning, and reweighting modules. These modules can benefit from each other and be jointly enhanced. The proposed framework can not only separate the class categories, but also separate noise samples from clean ones. It does not rely on the noise model and is a more flexible training method of deep neural networks. It can solve the situation of complex open set noisy labels which are often encountered in real-world data. Open set noisy labels refer to that a noisy sample possesses a true class that is not contained within the set of known classes in the training data.