登录 EN

添加临时用户

虚拟电厂的聚合与基于深度强化学习的优化运行

Virtual Power Plant Aggregation and Deep Reinforcement Learning Based Optimal Operation

作者:王龙
  • 学号
    2018******
  • 学位
    硕士
  • 电子邮箱
    wlo******com
  • 答辩日期
    2021.05.18
  • 导师
    吴文传
  • 学科名
    电气工程
  • 页码
    72
  • 保密级别
    公开
  • 培养单位
    600 清华-伯克利深圳学院
  • 中文关键词
    虚拟电厂,强化学习,马尔可夫决策过程,能量管理,电压控制
  • 英文关键词
    Virtual Power Plant,reinforcement learning,Markov decision process,energy management, Volt-VAR control

摘要

分布式发电、储能、可控负荷等大规模分布式能源因为其低排放、低运行成本的特点被整合到配电网中。在当前的电力市场中,发电量小的实体不能直接参与电力市场竞价和系统运行。一方面,小容量的实体对电网整体运行影响不大。由于缺乏激励,他们参与市场运行的积极性不高。另一方面,应将电气距离较近、特性互补的分布式资源聚合成一个整体,整体对外运行,以降低总的运营成本。本文研究灵活性资源的聚合技术以形成虚拟电厂(VPP),并应用深度强化学习(DRL)技术解决了可观测和部分可观测条件下能量管理和安全运行问题。虚拟电厂的聚合为激活灵活性资源的调节能力提供了一种解决方案。本文提出了一种考虑网络重构的资源聚合方法,使虚拟电厂的电压偏差和注入功率波动最小,与此同时保证了同一个虚拟电厂中的拓扑连接,还进一步提高了虚拟电厂的性能。该模型被建模成混合整数线性规划模型并进行了求解。在此基础上,本文还提出了一种基于k-shape算法的场景削减方法,减少了计算量。由于虚拟电厂中的物理模型往往不精确,传统的基于模型的优化方法很难求解。本文考虑了虚拟电厂的两种运行条件,一种是虚拟电厂完全可观测,另一种是虚拟电厂部分可观测。对于完全可观测的虚拟电厂,能量管理问题可以为电池储能系统的随机动态优化。当模型中考虑电池储能系统的老化时,该问题将是一个非凸且难以解决的问题。本文将电池储能系统调度过程建模成多阶段随机优化问题,提出了一种改进的双延迟深度确定性策略梯度(TD3)算法,通过改进强化学习过程中样本的抽样策略,让代理倾向于选择回报更高的样本,减少搜索空间,加快收敛速度。对于部分可观测的虚拟电厂,目标是提高其安全性,减少网损。然而,由于系统状态部分可观,不能明确地计算网损。因接入同类负荷的节点的行为是相似的,基于负荷分类技术,利用递归神经网络(RNN)提取网损的内部知识描述,并对网损进行近似计算。将部分可观测无功电压控制(VVC)问题建模为部分可观测马尔可夫决策过程(POMDP)。针对这一问题,提出了RNN-TD3算法,实现了降低网损和提高安全性的目标。综上所述,本文重点研究了在虚拟电厂中可控资源的聚合问题,实现了虚拟电厂在完全可观测和部分可观测两种情况下的经济和安全运行。

Large scale distributed energy resources (DERs), such as distributed generation, energy storage and controllable load, are integrated into the distribution networks due to their low emission and operation cost. In the current electric market, entities with small generation capacity cannot directly participate in the power market bidding and system operation. On the one hand, a single entity with the small capacity has little impact on the overall operation of the power grid. Due to the lack of incentives, their enthusiasm to participate in the operation of the market is not high. Moreover, the complementary DERs with close electrical distances should be aggregated and managed as a whole to reduce the overall operational costs. This dissertation studies the flexible resources aggregation technology to form Virtual Power Plants (VPPs) and applies deep reinforcement learning (DRL) technology to solve the energy management problem of the virtual power plant under observable conditions and partially observable conditions. The aggregation of Virtual Power Plants (VPPs) provides a solution to activate the regulation capacity of flexible resources. In this dissertation, a resource aggregation method considering network reconfiguration is proposed. It is formulated as an optimal network partition model for minimizing the voltage deviation and the fluctuation of injection power inside VPPs. A new convex formulation of network reconfiguration strategy is incorporated in this method which can guarantee the components of the same VPP connected and further improve the VPPs’ performance. The proposed model is cast as an instance of mixed-integer linear programming (MILP) and can be effectively solved. Moreover, a scenario reduction method was developed to reduce the computation burden based on the k-shape algorithm.Since the accurate physical model of VPP is always not available, the traditional model-based optimization method is impractical. In this dissertation, we consider two operation conditions for VPP, one situation is where the VPP is fully observable and the other is where the VPP is partially observable.For fully observable VPP, the energy management problem can be formulated as a stochastic and dynamic optimization model in the battery energy storage system. When the degradation of the battery energy storage system is included in the model, the problem will be nonconvex and hard to tackle. Improved Twin Delayed Deep Deterministic Policy Gradients (TD3) is proposed to solve the multistage stochastic optimization problem (MSOP) for optimizing the dispatch of the battery energy storage system. By improving the sampling strategy in the process of reinforcement learning (RL), the agent tends to select the samples with higher reward , so it can reduce search space and accelerate convergence speed.For partially observable VPP, the optimization of the issue is to improve its security and reduce network loss. However, the network loss cannot be explicitly formulated since it is unobservable. Based on the load classification technology, the behavior of nodes connected to similar loads is similar. An internal description of network loss is extracted by the recurrent neural network (RNN), then the calculation of network loss is approximated. The partially observable Volt-VAR control (VVC) problem is modeled as a partially observable Markov decision process ( ). algorithm is proposed to solve the problem and the knowledge description of network approximate loss is extracted by the RNN network. The numerical tests show it can reduce network lossess and improve voltage profile.To sum up, this dissertation focuses on the aggregation of controllable resources in Virtual Power Plants (VPPs) and realizes the economical and safe operation of the virtual power plant for two kinds of scenarios: fully observable and partially observable.