面向多自由度机器人的技能制定与学习方法研究是智能机器人领域的核心研究问题之一。其中,行为示教的机器人技能制定方法,因其制定技能过程具有极高的便捷性,以及制定技能最终效果具有很好的可预测性,受到了业界和学术界较多的关注。然而,现有的行为示教机器人技能模仿学习方法普遍对示教样本的质量和数量提出了较高要求,限制了其在实际机器人技能学习任务中的应用能力。本文针对现有工作的不足,围绕“示教质量不佳情况下难以有效学习”、“专家提供精细示教轨迹难度大”、“长序列任务下示教数据利用效率低”等科学问题,提出以下三点创新: 1. 针对示教质量不佳情况下难以有效学习的问题,提出了一种将不完美示教用作软约束的技能强化学习方法。区别于现有工作,本文提出的方法能够在有效利用示教中包含的信息,有效完成最优策略学习的同时,规避示教的不完美性对学习策略的最优性和收敛性造成的影响。并且能够借助不完美示教中的有效信息,实现对策略学习效率的提升。 2. 针对专家提供精细示教轨迹难度大的问题,提出了一种仅利用专家倾向性选择信息为依据的技能增强与迁移方法。区别于现有工作,本文构建了一种基于分布迁移的倾向性学习方法,在通过少量粗糙行为示教对策略初始化后,专家仅需要对机器人产生的行为样本进行倾向性筛选即可实现对目标任务策略的成功学习,避免了直接向专家索取精细轨迹示教的困难。 3. 针对长序列任务下示教数据利用效率低的问题,提出了基于Option的层次化技能模仿学习方法。区别于现有工作,本文首次提出了层次化策略与“子任务-状态-动作对"出现频度的一一对应关系,构建了基于出现频度差异的层次化技能模仿学习方法,并设计了一种迭代式的优化算法,通过对长序列任务示教中子任务的自动划分与整合,实现了层次化策略的端到端学习,从而提升了对示教样本的利用效率。 在几类多自由度机器人仿真环境下的实验结果验证了本文提出方法的有效性,展现了本文提出的算法相对现有行为示教技能学习算法的性能优势。
Robot operating skill designing and learning is one of the core research issues in robotics. Among them, imitation learning methods have aroused a lot of attention from the industry and academia, due to their extremely convenient and predictability when designing robot operating skills. However, existing works on imitation learning generally put forward high requirements on the quality and quantity of demonstrations, which limit their application ability in practical robot operating skill designing and learning tasks. In view of the deficiencies of the existing work, this thesis focuses on scientific problems such as "How to effectively do robot operating skill imitation when demonstrations are imperfect", "How to learn robot operating skills without demanding elaborate demonstrations from experts", "How to maintain the demonstration utilization when imitating under long-horizon tasks", and proposes three innovations: 1. For imitating robot operating skills with imperfect demonstrations, a reinforcement learning method that uses imperfect demonstrations as a soft constraint is proposed. Different from existing works, the proposed method can effectively utilize the information implicated in the demonstration when learning the robot operating skills, and avoid the influence of the imperfectness on the optimality and convergence when optimizing robot skills. 2. For imitating robot operating skills without demanding elaborate demonstrations from experts, a skill enhancement and transfer method is proposed, requiring only preference-based selections. Different from existing works, a preference-based learning method achieved through distribution transfer is constructed. After initializing the robot operating skill through a small number of rough demonstrations, the expert only needs to perform preference-based selection on the trajectories generated by the robot to fulfill the learning goal, avoiding the difficulty of demanding elaborate demonstrations from experts. 3. For imitating robot operating skills in long-horizon tasks, an Option-based hierarchical imitation learning method is proposed. Different from existing works, by proposing the one-to-one correspondence between the hierarchical robot operating skill and the option-state-action occupancy measurement for the first time, a hierarchical imitation learning algorithm that minimizes the discrepancy of the occupancy measurement is constructed. An iterative optimization model is designed for automatically dividing and integrating the sub-tasks within the long-horizon task and learning the hierarchical robot operating skill end-to-end, thereby improving the utilization of demonstrations. Experiments on several simulation environments confirm the effectiveness of the proposed methods, and illustrate their performance advantages over existing imitation learning algorithms.