随着太空活动的频繁,空间环境中各类飞行器的数量日益增长。自由漂浮空间机器人由于良好的机动性能,在执行维修,装配和清理任务方面发挥着重要的作用。传统的规划控制方法的复杂建模和依赖专家知识的控制器设计限制了对机器人控制平台的开发,而无模型的强化学习方法有效克服了上述问题。然而, 如何在复杂的空间环境中实现高精度和鲁棒的轨迹规划仍然是一个开放性的问题。因此,针对多目标位置,多目标位姿,动态目标以及图像输入的四种工况条件下,本文提出了基于动作集成的多目标规划,分层解耦优化,双模块轨迹跟踪以及端到端学习的控制框架,解决了无模型的数据驱动下的空间机器人智能规划与控制问题。主要的创新成果如下:(1)针对多目标位置的轨迹规划问题,提出了基于强化学习PPO算法的轨迹规划方法,同时引入了基于泊松分布的动作集成方法提高了收敛后的轨迹规划精度。实验结果表明了训练得到的策略能够在合理的精度下有效跟踪期望轨迹线上的目标位置,同时对未知干扰具有一定的鲁棒性。(2)针对多目标位姿的避障规划问题,提出了分层解耦优化框架,实现了多目标位置姿态要求下的轨迹规划,同时具有避碰环境中障碍物的能力。上层生成合理的末端轨迹,下层中的解耦智能体分别规划末端的位置和姿态。此外针对双智能体的训练,提出了基于事件的交替迭代优化方法使得训练具有平稳性和收敛性。(3)针对非合作目标上的跟踪规划问题,提出了感知预测模块和轨迹规划模块相结合的双模块系统。感知预测模块解决了非合作目标的姿态识别及其上目标点的轨迹预测,轨迹规划模块解决了基座约束条件下双臂空间机器人的轨迹规划。该方法使得整个系统能够在一定时间范围内追踪到非合作目标上的两个目标点,进而辅助了之后对非合作目标的抓捕。(4)针对端到端学习的规划控制问题,提出了基于熵正则化强化学习算法的感知规划控制框架,解决了高维度图像特征下策略寻优的问题。实验结果表明训练得到的策略有效提取了图像上的信息特征,从而实现了末端对目标的接近。综上所述,论文中的四种方法有效解决了四种不同需求下的轨迹规划问题,为未来空间机器人智能操作扩展了新的研究思路。
With the increasing number of spacecrafts, free-floating space robot plays an important role in the missions, including repair, assembly and cleaning. Traditional control methods rely highly on the complex modeling and expert experience, thus leading to restrict the large-scale applications of space robots. Interestingly, model-free reinforcement learning (RL) achieves a significant performance on these tasks, especially for the trajectory planning. However, reaching the high accuracy and robustness together remains an open challenge in space robotics. Therefore, the thesis proposed some efficient approaches to perform trajectory planning tasks on four working conditions, including multi-position, multi-pose, dynamic targets and inputing images. The key contributions can be summarized in the following parts: (1) In order to solve the multi-position trajectory planning, we designed a model-free method based on the PPO algorithm. Thanks to the action ensemble method we proposed, our approach improves the accuracy, robustness and generalization. (2) For the collision-free multi-pose trajectory planning, we developed a novel framework, namely Hierarchical Decoupling Optimization. The method enhances the sampling optimization efficiency dramatically for RL-based algorithms. Particularly, the event-based alternating optimization process guarantees the learning stability and convergence. (3) Considering dynamic targets on a non-cooperative satellite, we proposed a two-module system to help a dual-arm space robot to track the targets. The first module completes the pose estimation of a spinning satellite and pose prediction of targets. The second module performs trajectory planning according to the predicted positions of targets. (4) Faced with the end-to-end learning problem, we constructed an integrated system, which consists of perception and planning parts. The experiments demonstrate the trained policy extracts the features from high-dimension images successfully, and achieve the trajectory planning task. To summarize, the planning approaches are applied in four typical conditions. We hope the study facilitates researchers from the community of space robots in the future.