登录 EN

添加临时用户

基于深度强化学习的综合能源系统优化调度研究

Research on Optimal Scheduling of the Integrated Energy System Based on Deep Reinforcement Learning

作者:杨照
  • 学号
    2019******
  • 学位
    硕士
  • 电子邮箱
    z-y******.cn
  • 答辩日期
    2022.05.23
  • 导师
    黄少伟
  • 学科名
    电气工程
  • 页码
    100
  • 保密级别
    公开
  • 培养单位
    022 电机系
  • 中文关键词
    综合能源系统,优化调度,强化学习,多主体结构,多重不确定性
  • 英文关键词
    Reinforcement learning,integrated energy system,optimal scheduling,multi-agent structure,multiple uncertainties

摘要

综合能源系统优化调度问题的求解面临能量耦合关系复杂、多主体结构导致隐私保护困难、多重不确定性难以处理等多重挑战。数据驱动的深度强化学习算法具有泛化能力强、推理速度快、不依赖于模型等特点,为求解综合能源系统优化调度问题提供了新思路。因此,本课题围绕综合能源系统优化调度问题展开研究,构建了从单园区综合能源系统到多园区综合能源系统等多种调度场景下的马尔可夫决策(MDP)模型,并针对不同场景设计了对应的强化学习求解算法。首先,针对单园区综合能源系统优化调度问题,提出了基于深度确定性策略梯度(DDPG)算法和知识迁移(KT)的综合能源服务商定价和调度策略。考虑园区综合能源系统内部的多主体结构,构建了以综合能源服务商和多个多能用户为对局者的主从博弈模型。为了保护用户隐私参数、提升求解效率,设计了面向综合能源服务商的 KT-DDPG 算法,给出了不依赖用户信息的综合能源服务商的定价和调度策略。提出了面向新场景的知识迁移方法,提升了算法的收敛速度。然后,针对多园区综合能源系统协同经济调度问题,提出了基于改进多智能体深度确定性策略梯度(I-MADDPG)算法的协同经济调度策略。设计了多园区能量交易的市场及其出清方法,为增强隐私保护和扩展能力,设计了“集中训练、分散执行”的 I-MADDPG 算法。在训练阶段对状态动作序列进行等效降维变换保护了园区隐私的同时降低了算法输入维度;在执行阶段各园区仅基于自身观测制定调度策略,实现了园区间能量互济,降低了通信需求,提升了求解效率。最后,针对多园区综合能源系统应急调度问题,提出了基于混合动作深度确定性策略梯度(HA-DDPG)算法的应急调度策略。以最大化失去外部能量供应时段内的供能收益为目标建立了应急调度数学模型,针对连续离散动作同时存在、动作空间不固定等问题,设计了添加动作屏蔽层的 HA-DDPG 算法,所制定的应急调度策略保证了重要负荷的持续供能。针对灾后初始拓扑不确定性设计了多专家应急调度系统,能在多种灾后拓扑下快速求解应急供能策略。综上,本论文将深度强化学习算法用于综合能源系统优化调度问题,实现了含多利益主体、多重不确定性等的复杂场景的优化调度,为综合能源系统的经济、高效运行提供了技术支撑。

The solution of the optimal scheduling problem of the integrated energy system faces multiple challenges, such as the complex energy coupling relationship, the difficulty of privacy protection caused by multi-agent structure, and the difficulty of dealing with multiple uncertainties. Data driven deep reinforcement learning algorithm has the characteristics of strong generalization ability, fast solution speed and independent of model, which provides a new idea for solving the optimal scheduling problem of integrated energy system. Therefore, this paper focuses on the optimal scheduling of the integrated energy system, constructs Markov decision models in various scheduling scenarios from the single park integrated energy system to multi parks integrated energy system, and designs corresponding reinforcement learning algorithms according to the characteristics of different scenarios. Firstly, aiming at the optimal scheduling problem of single Park integrated energy system, this paper proposes the pricing and scheduling strategy of park integrated energy service providers based on reinforcement learning. Considering the multi-agent structure in the park integrated energy system, a stackelberg game model with the integrated energy service provider and multiple multi-energy users is constructed. In order to protect users’ privacy parameters and improve solution efficiency, the deep deterministic policy gradient (DDPG) algorithm is designed, and the pricing and scheduling strategies of the integrated energy service provider are formulated. A new scene oriented knowledge transfer method is proposed to improve the convergence speed of the proposed algorithm. Then, aiming at the cooperative economic scheduling problem of multi Park integrated energy system, this paper proposes a cooperative optimal scheduling strategy of the multi parks integrated energy system based on reinforcement learning. A multi agent deep deterministic strategy gradient algorithm based on the internal market (I-MADDPG) is proposed to formulate the cooperative scheduling strategy of each park, which realizes the energy exchange of the parks. The proposed method has strong scalability and privacy protection ability. At the same time, the algorithm does not rely on the prediction of multiple uncertainties, and can be applied to real-time scheduling. Finally, aiming at the emergency scheduling problem of the multi parks integrated energy system, this paper proposes an emergency scheduling strategy of the multi park integrated energy system based on reinforcement learning. In order to maximize the energy supply revenue during the period of the loss of external energy supply, the emergency energy supply optimization model is established, the mixed action ddpg algorithm (HA-DDPG) is designed, and the emergency scheduling strategy is formulated to ensure the continuous energy supply of the most important loads. At the same time, a multi experts system for post disaster topology uncertainty is proposed, which realizes the rapid solution of emergency energy supply strategy under various post disaster topologies. To sum up, this paper proposes the application of deep reinforcement learning algorithm to solve the optimal scheduling problem of the integrated energy system, expands the relevant research scope of the the optimal scheduling method of the integrated energy system, which realizes the economic and efficient operation of integrated energy system in the face of complex scheduling scenarios with multiple stakeholders and multiple uncertainties.