四足机器人可以在平整的地面上高速奔跑,也拥有在崎岖的地形中稳定运动的能力。然而,和可以在不平地面上高速奔跑的四足动物不同,目前对四足机器人的研究中,还没有将机器人的地形的适应能力和高速运动能力结合起来,以达到在不平地面上奔跑的效果。 本文受到动物肌肉运动机理的启发,设计了一种基于关节反馈增益调节的四足机器人控制器。该控制器的核心是一个由强化学习训练得到的、根据机器人状态调节关节反馈增益和前馈力矩的轨迹跟踪控制策略。通过训练,该控制策略可以根据当前的状态和跟踪目标,选择合适的关节反馈增益,以让机器人能更好地应对不平路面导致的扰动。为了跨过不平路面上可能出现的障碍,我们设计了一种兼顾抬腿高度和运动速度的参考关节轨迹规划器,用于向控制策略提供参考运动轨迹。我们还设计了一个专门用于敏捷运动的状态观测器,用于给控制策略提供状态估计值。 为了提高强化学习算法优化上述控制策略的效率以及策略的表现,我们提出了统计独立的课程学习。这是一种供多个并行的强化学习环境使用的课程学习方法。这种方法按照一种带有随机化的规则调节单个强化学习环境的难度,使得在所有并行的强化学习环境中,一部分处在低难度区域而其它环境处在中等难度区域。在训练过程中,我们使用该方法,同时调节每个仿真环境的前进速度指令和地形不平整的程度。 本文在仿真环境和实际环境中的多种地形中测试了控制器的性能。仅依靠机器人的本体感知,在仿真中,我们的方法可以控制机器人通过多种训练中没有出现过的静态和动态的地形。在实物测试时,我们在斜坡、楼梯、草地等实际的不平地形上验证了机器人的奔跑效果。我们的方法在室外的不平地面上的最大奔跑速度达到了2.0m/s。以上实验结果证明了我们的方法具有在不平地面上奔跑的能力。
Quadrupedal robots have the capability of high-speed locomotion or traversing on uneven terrains. However, current quadrupedal controller cannot combine the two capability above to achieve running locomotion on uneven terrain. Inspired by the mechanism of muscle activation, we propose a quadrupedal robot controller based on joint feedback gain adjustment. The key element of the controller is a trajectory-tracking control policy which sets joints‘ feedback gains as well as feed-forward torques. The control policy is optimized via reinforcement learning and is able to apply proper joint feedback gains according to robot state and reference trajectory so that the robot can resist disturbance caused by uneven terrains. We design a reference joint trajectory planner to generate reference running trajectories under joint velocity limits. The trajectories have certain foot clearance to overcome obstacles on uneven terrains. We also design a state estimator dedicated for agile locomotion, which provides estimations for the control policy. In order to boost training efficiency and improve policy performance during the training progress of reinforcement learning, we propose independent curriculum learning, which is an curriculum learning approach for parallel reinforcement environments. By independently adjusting each environment‘s difficulty, our approach is able to maintain a small portion of low-difficulty environments while keeping other environments at medium difficulty. We use this approach to simultaneously adjust forward velocity commands and the level of terrain unevenness during training progress. The controller‘s performance is evaluated in multiple simulated and real environments. The controller managed to run pass multiple static and dynamic terrains in simulation with proprioception only. These terrains are not experienced during policy training. During real-world evaluation, we have tested the controller‘s running behavior on slopes, stairs, and grass. Our controller has achieved a maximum running speed of 2.0m/s on uneven outdoor terrains. The experiments above have proven that our approach has the capability of running on uneven terrains. The key innovation of this paper can be summed as below:1. We propose an rl-trained quadrupedal robot controller based on joint feedback gain adjustment. The controller applies proper joint feedback gains to stabilize the robot during running on uneven terrains.2. We propose independent curriculum learning for controller‘s training progress, which boosts the percentage of effective training data and eliminates the need for computing global statistics.