登录 EN

添加临时用户

基于深度强化学习的空间机械臂轨迹规划

Trajectory Planning of Space Manipulator based on Deep Reinforcement Learning

作者:赖文灿
  • 学号
    2020******
  • 学位
    硕士
  • 电子邮箱
    leo******com
  • 答辩日期
    2023.05.25
  • 导师
    陈恳
  • 学科名
    机械工程
  • 页码
    92
  • 保密级别
    公开
  • 培养单位
    012 机械系
  • 中文关键词
    空间机械臂,深度强化学习,轨迹规划,在轨服务,端到端
  • 英文关键词
    Space manipulator, Deep reinforcement learning, Trajectory planning, On-orbit service, End-to-end

摘要

使用空间机械臂系统抓捕在轨的失效卫星,可以用于在轨服务或太空垃圾清除,对于降低航天成本和近地空间可持续发展具有重要意义。然而,在自由漂浮状态下,机械臂和卫星基座会产生运动耦合,从而影响抓捕精度。为了提升空间机械臂的自主性,本文将深度强化学习应用到非合作目标抓捕轨迹规划中,实现了端到端的控制。具体的研究内容如下: 首先,建立了空间机械臂抓捕轨迹规划的研究框架,并开发了MuJoCo动力学仿真环境。针对MuJoCo内置位置伺服的不足,本文设计并实现了结合关节规划器和跟踪器的关节控制方案。结合主流的深度强化学习软件工具,使得深度强化学习算法能够应用于空间机械臂抓捕任务的仿真、策略训练和迁移验证过程。 其次,设计了基于RGB-D图像的目标位姿估计网络。该网络结构采用RGB图像和深度图像分别进行特征提取,然后将特征融合并进行回归,以实现对目标位姿的准确预测。通过在MuJoCo仿真环境中获取RGB-D图像和精确位姿制作数据集,本文确定了网络参数并完成了网络训练,从而实现了较高精度的目标位姿估计。 本文开展了基于相对位姿的抓捕轨迹规划方法研究,提出了改进TD3算法,并设计了适用于空间抓捕任务的奖励函数。研究分析了强化学习算法原理,针对TD3算法目标Q值估计过低的问题,本文提出了改进方法,将两个Critic网络的最小值改进为调整值,从而提高了最终训练出的策略的性能和稳定性。对于基于相对位姿的抓捕轨迹规划任务,本文设计了不同类型的奖励函数,并通过实验证明分段稠密奖励函数训练出的模型具有更加稳定的收敛性,在仿真环境中实现了100%的抓捕成功率。 最后,在仿真环境中实现了端到端的视觉伺服抓捕,并在真实环境中验证了抓捕轨迹规划策略的可迁移性。本文采用了解耦的端到端视觉伺服方案,将目标位姿估计网络的输入传递给抓捕轨迹规划策略,在仿真环境中基于RGB-D图像完成了抓捕轨迹规划的任务。在真实环境下,使用六自由度机械臂抓捕自转的星箭对接环模型。利用深度强化学习策略,以相对位姿为输入,实时输出关节角度变化量作为轨迹规划结果,实现了66.7%的抓捕成功率,验证了仿真环境中训练的策略能迁移到真实环境中。

Using a space manipulator system to capture malfunctioning satellites in orbit can be used for in-orbit servicing or space debris removal, which is of great significance for reducing space mission costs and promoting sustainable development in near-Earth space. However, the free-floating satellite base introduces motion coupling, which affects the accuracy of the capture operation. In order to improve the autonomy of space manipulator, this paper applies deep reinforcement learning to non-cooperative target capture trajectory planning, achieving end-to-end control. The specific research contents are as follows: Firstly, a research framework of the capture trajectory planning task is designed, and the MuJoCo dynamics simulation environment is developed. To address the limitations of the built-in position servo in MuJoCo, a joint control scheme combining joint planner and tracker is designed and implemented. By integrating mainstream deep reinforcement learning software tools, deep reinforcement learning algorithms can be applied to the space manipulator capture task, including simulation, policy training, and transfer verification processes. Second, a target 6D pose estimation network based on RGB-D images is designed. The network structure adopts a scheme that extracts features separately from RGB and depth images, and then fuses the features for regression to achieve accurate prediction of the target pose. RGB-D images and precise pose data are collected from the MuJoCo simulation environment to create our own dataset. The network parameters are determined. Once trained, the pose estimation network achieves high-precision prediction results. Research on capture trajectory planning based on relative pose is then carried out. An improved TD3 algorithm is proposed along with a reward function design for space capture tasks. The principles of reinforcement leaning algorithm are studied and analyzed. To address the underestimation issue of TD3 algorithm, the minimum value of the two critic target networks is improved to the fitted value, which enhances the policy performance and stability. For the capture trajectory planning task based on relative pose, different types of reward functions are designed. Experimental results demonstrate that models trained with segmented dense reward functions exhibit more stable convergence, achieving a 100\% capture success rate in the simulation environment. Lastly, end-to-end visual servo capture is verified in simulation environment, and the transferability of the capture trajectory planning policy is validated in a real-world environment. This paper adopts a decoupled end-to-end visual servo scheme. The 6D pose estimation network passes its prediction to the capture trajectory planning policy. The capture task is accomplished based on RGB-D images in the simulation environment. In a real-world environment, a 6 DOF robotic arm is used to capture a rotating star-arrow docking ring model. The DRL policy takes relative pose as input and provides real-time joint angle increments as trajectory planning results, achieving 66.7\% capture success rate. The transfer experiment validates that policy trained in the simulation environment could be transferred to real-world scenarios.