随着工业技术的发展,各式机器人被设计出来以完成多样化的任务。近年来,在机器人运动控制、导航等领域,以深度强化学习为代表的人工智能算法取得了重大进展和突破。然而,此类算法具有鲁棒性差、对超参数敏感以及容易陷入局部最优的劣势。另一方面,演化算法由于其鲁棒性高、全局搜索能力强的特性,在解决高维度的控制问题上与深度强化学习方法形成了相互竞争的趋势。然而,两种方法具有互补的特点。从生物学的角度来看,著名的鲍德温效应表明,个体的学习能够增加该物种的进化优势,而高阶的种群优势更有助于个体的学习。这一理论带来了如下启示:充分结合两种范式的混合算法或许会给机器人运动控制这一复杂优化问题的求解带来新思路。 构建此类混合算法的过程存在两大难点:其一,种群的适应度评估需要所有控制策略与真实环境或仿真器进行多次交互,消耗大量的计算资源,从而限制了混合算法在大规模和计算代价高昂的问题中的适用性。其二,基于种群的交互方式虽然能够带来丰富的学习样本,但是种群的多样性并没有得到显式的控制,导致大量重复的样本被反复利用,阻碍性能的提升。由于评估偏差和遗传操作不确定性的存在,算法在优化求解的过程中面临较大的波动。 针对上述问题,本文首先提出了代理辅助的演化强化学习算法SERL。该算法通过引入代理模型以及两种带有精英保护机制的管理策略来提高适应度评估的效率和算法的收敛质量。在SERL的基础上,本文进一步提出了多目标代理辅助的演化强化学习算法MSERL。基于对策略多样性的定义和代理模型高效的适应度评估,该算法以多目标优化的方式控制种群产生高质量并且多样的学习样本,促进强化学习策略的利用以及种群在优化空间中的探索。在一系列机器人运动控制任务上的实验表明,SERL相比单一的演化算法、深度强化学习算法以及现有的混合算法,在样本效率、计算时间、鲁棒性以及收敛质量上具有更大优势。而MSERL在SERL的基础上进一步提升了收敛速度和最终性能。 最后,面向实际场景中存在的质量相似的多样性问题,本文在SERL方法的基础上构建了多样性生成工具QSD-TOOL。该工具能够高效地生成一组用户感兴趣并且质量相似的多样化策略,并在机器人运动控制和游戏AI的设计中成功应用。
With the development of industry technology, various robots have been designed to accomplish diverse tasks. In recent years, artificial intelligence algorithms, represented by Deep Reinforcement Learning (DRL), have made significant progress and breakthroughs in robot motion control, navigation and other fields. However, such algorithms are generally characterized by poor robustness, high sensitivity to hyperparameters and susceptibility to local optima. On the other hand, Evolutionary Algorithms (EAs), due to their high robustness, strong global search ability and strong flexibility, have formed a trend of mutual competition with DRL methods in solving high-dimensional control problems. However, the two methods have complementary characteristics. From a biological perspective, the famous Baldwin effect suggests that individual learning can increase the evolutionary advantage of the species, while higher-order population advantages are more conducive to individual learning. This theory brings the following insight: the combination of EAs and DRL can fully utilize the strengths of both and make up for the shortcomings of a single algorithm, bringing new ideas for solving motion control problems of robots. However, there are two difficulties in constructing such hybrid algorithms: Firstly, the fitness evaluation of the genetic population requires interaction between all control policies and the real environment or simulator, resulting in a large amount of computational cost, which limits the applicability of hybrid algorithms in large-scale and computationally expensive problems. Secondly, although population-based interaction can bring rich learning samples, the diversity of the population has not been explicitly controlled, resulting in a large number of repeated samples being repeatedly used by reinforcement learning, which hinders the performance improvement of the algorithm. Due to evaluation bias and uncertainty of genetic operation, hybrid algorithms face significant fluctuations during the learning process. Based on the above problems, this paper first proposes a method named Surrogate-assisted Evolutionary Reinforcement Learning (SERL). The algorithm introduces a surrogate model and two model management strategies with the elite protection mechanism to reduce the cost of environmental interaction and lowers optimization difficulty, resulting in more efficient fitness evaluation and higher convergence quality. In order to further improve the performance of the hybrid algorithm, a Multi-objective Surrogate-assisted Evolutionary Reinforcement Learning algorithm (MSERL) is proposed. Based on the definition of policy diversity and efficient surrogate-assisted fitness evaluation, MSERL controls the genetic population to generate high-quality and diverse learning samples in a multi-objective optimization manner during the learning process, promoting the utilization of samples in reinforcement learning and exploration of the evolutionary algorithm. This paper evaluates two proposed algorithms, SERL and MSERL, on a series of motion control tasks. The experiments show that SERL has greater advantages in sample efficiency, computation time, robustness and convergence quality compared to single EA, DRL and existing hybrid algorithms. MSERL further improves the convergence speed and final performance. Finally, to address the Quality-Similarity Diversity (QSD) problem which exists in practical scenarios, this paper builds a general diversity generation tool named QSD-TOOL based on SERL. This tool can efficiently generate a set of diverse strategies that are both interesting to users and of similar quality. The paper successfully applies this tool in the domain of robot motion control and game AI design.