无人机辅助通信凭借着机动性强、部署灵活、运营成本低、覆盖范围广等特点成为了近年来学术界和工业界重点关注的领域。目前无人机辅助通信的应用主要集中在两方面:无人机作为基站覆盖地面用户,以及无人机作为数据转发中继连接用户与基站。相比较于传统地面通信,无人机辅助通信在空中的自由度更高,可以优化其飞行轨迹,但也大大增加了问题复杂度。此外,无人机辅助通信还需要面临能量受限的问题,需要在飞行过程中兼顾能耗,更进一步地加大了问题复杂度。随着深度强化学习技术的发展,越来越多的高复杂度问题能够得到很好地解决。本文将以深度强化学习作为数学工具,研究智能无人机辅助通信系统,具体如下: 针对有限能量约束下单无人机基站对地面用户的覆盖,本文构建了四旋翼无人机基站的飞行能耗与其三维飞行轨迹之间的函数关系,并基于深度强化学习算法对无人机基站的飞行轨迹以及频带分配策略进行优化。针对训练过程中存在的维度不平衡、梯度消失、训练不稳定等问题,提出维度扩展、激活前惩罚以及参考神经元等方法,使得无人机基站在有限能量约束下,提升系统吞吐量,保障用户公平性。 针对多无人机基站场景下的空地协同通信系统,本文提出了无人机基站飞行轨迹和地面用户接入控制的联合优化方法。考虑到空地协同系统中智能体的异构性导致的混合动作空间问题,提出将离散动作转化为连续动作概率,实现了无人机基站以及地面用户之间的联合优化,提升了系统吞吐量,保障了用户公平性。 针对悬停多跳无人机中继网络中的数据转发问题,本文提出了一种无人机间的深度强化学习训练机制,将状态动作价值函数与传输路径长度建立联系,从而在规避网络拥堵情况下,缩短传输路径。针对动态拓扑结构下的无人机中继网络,进一步提出无人机内训练机制,将无人机划分为多个子智能体,从而减小无人机的动作空间。通过优化中继无人机的飞行轨迹、频带分配以及数据转发策略,降低网络拥堵率,提升网络吞吐量,缩短传输时间。
Due to the mobility, flexible deployment, low cost, and wide coverage, Unmanned Aerial Vehicle (UAV)-assisted communication has drawn increasing attention from both academia and industry. The main applications of UAV-assisted communication can be divided into two categories: UAVs as base stations (UAV-BSs) to cover ground users (GUs), and UAVs as data relays to connect GUs and BSs. Compared with traditional terrestrial communication system, UAV-assisted communication can be deployed flexibly and can fly everywhere. We can optimize UAV trajectories to improve the system performance at the cost of problem complexity. In addition, UAV-assisted communication also faces the problem of energy constraint. The energy consumption during flight should be taken into account, which further increases the complexity of the optimization problem. With the development of deep reinforcement learning, many complex problems can be well solved. In this paper, we use deep reinforcement learning as mathematical tools to study intelligent UAV-assisted communication system. The main works are as listed follows: For the coverage of single UAV base station to ground users under limited energy constraint, we formulate the energy consumption model of a quad-rotor UAV as a function of the UAV's 3D trajectory and optimize the trajectory and frequency band allocation based on deep reinforcement learning. For the problems of dimension imbalance, gradient vanishing and instability in the training process, dimensional spread, pre-activation penalty and softmax reference are proposed to enable the UAV-BS to improve the system throughput and guarantee the fairness among GUs under the limited energy constraint. For the air-ground coordinated communication system with multiple UAV-BSs, this paper proposes a joint optimization method for UAV-BS trajectory design and GU access control. Considering the mixed action space problem caused by the heterogeneous agent in the air-ground coordinated communication system, we propose to transform discrete actions into continuous action space to enable the joint optimization of UAV base stations and ground users, which improves the system throughput and guarantees the fairness among GUs. To address the packet routing problem in the hovering multi-hop UAV relay network, this paper proposes an inter-UAV training mechanism that relates the state action value function to the transmission path length. Maximizing the state action value is equal to shortening the transmission path. For the dynamic multi-hop UAV relay network, an intra-UAV training mechanism is further proposed to divide each UAV into subagents, which can reduce the action space. By optimizing the UAV trajectories, frequency band allocation, and packet routing, we can reduce the network congestion probability, improve the network throughput, and shorten the transmission time.