登录 EN

添加临时用户

考虑众包配送的当日达场景动态派单与路径规划问题研究

Dynamic Dispatching and Route Planning for the Same-Day Delivery Problem with Crowdsourced Delivery

作者:李韬龙
  • 学号
    2021******
  • 学位
    硕士
  • 电子邮箱
    275******com
  • 答辩日期
    2024.05.17
  • 导师
    戚铭尧
  • 学科名
    物流工程与管理
  • 页码
    88
  • 保密级别
    公开
  • 培养单位
    599 国际研究生院
  • 中文关键词
    当日达问题;众包配送;车辆路径规划;动态派单;深度强化学习
  • 英文关键词
    Same-day Delivery Problem; Crowdsourced Delivery; Vehicle Routing; Dynamic Dispatching; Deep Reinforcement Learning

摘要

社区团购、本地生活服务等零售业态的蓬勃发展,推动了即时配送服务市场规模高速增长,配送场景也从餐饮拓展到百货生鲜等本地近场电商。国内外的服务商对此纷纷推出“当日达”的仓配模式。当日达服务以前置仓门店作为仓储场所,在运营时间内接收动态到达的订单并分配给配送员,以介于传统电商和餐饮外卖之间的小时级时效进行配送,这对服务商的履约能力提出了新的挑战。如何在保证履约时效的同时降低成本成了服务商面临的关键问题。伴随着共享经济和众包物流的发展,“轻资产”运营、灵活供给的众包配送成为原有运力的有效补充、受到业界青睐,许多服务商开始采用专职运力与众包运力结合的配送模式。在需求端时效提高、履约端运力模式转型的双重变化下,如何高效地派单并规划路径成为学术界和业界都十分重视的课题。本文研究了考虑众包配送的当日场景动态派单与路径规划问题,通过优化派单策略和路径规划实现降本增效。本文首先对问题的决策时刻、订单信息、众包与专职车队等要素进行描述,建立马尔科夫决策过程模型,目标为最小化由延误惩罚和配送里程成本构成的总履约成本。由于模型状态和动作空间过大导致的“维度灾难”,问题无法通过贝尔曼方程直接求解。现有研究常采用统计采样方法或专家经验策略求解,且考虑的运力模式单一,其效果和应用范围有限。本文引入深度强化学习方法来考虑未来长期收益,并结合运筹优化算法提高履约效率,更符合当下的应用需求。具体而言,本文设计了基于深度强化学习双层优化算法框架。对上层全局动态问题设计了深度强化学习算法,为智能体定义了新的观察空间和奖励函数,并基于近端策略优化算法训练模型,用于决策延迟派单与否。全局动态问题将反复求解下层子问题,因此对下层子问题设计了基于成本的插入算法和变邻域搜索算法, 采用启发式方法快速获取单轮次高质量路径解。为了评估算法的有效性,本文在贴合现实的多种运力规模、订单规模和时空类型的算例上进行数值实验。针对单轮次子问题,数值实验验证了启发式算法的收敛性和可靠性。对于全局问题,本文对多种动态派单策略进行比较,结论显示相比于其他基准策略,本文的深度强化学习联合优化算法可以显著降低总成本,也说明了合理的延迟派单策略具有集单效果。此外,本文针对订单规模等重要参数进行了泛化性实验和灵敏度分析,验证了模型的泛化能力,并提出了具有实践价值的管理学见解。

The instant delivery service market has grown rapidly due to the thriving development of retail formats including local life services and community group buying. In addition to catering, local e-commerce for groceries and other goods is now included in delivery scenarios. The "same-day delivery" warehousing and distribution model is implemented by domestic and international service providers for local e-commerce. This model bridges the gap between traditional e-commerce and food delivery by using front-end stores as storage facilities. Orders are dynamically incoming during operating hours, and they are dispatched to couriers with the goal of delivery within hours. This puts additional demands on service providers‘ fulfillment capabilities. For service providers, striking a balance between cost containment and timely completion has become a challenge. Alongside the development of the sharing economy and crowdsourced logistics, the "asset-light" operation and flexible supply of crowdsourced delivery have become an effective complement to existing couriers, and are favored by many service providers who adopt a hybrid fleets model combining full-time couriers and crowdsourced delivery. Amidst the dual changes of improving demand-side timeliness and the complexity of fulfillment-side capacity models, both the academic and industry sectors now place a high value on successful order dispatch and route planning. This thesis investigates the dynamic order dispatching and route planning problem in same-day delivery problem with crowdsourced delivery, aiming to reduce costs and increase efficiency. Firstly, this thesis describes the elements of same-day delivery, including decision moments, order information, and hybrid fleets. Next, a Markov decision process (MDP) model is established, where the decision objective is to reduce the overall fulfillment cost composed of delay penalties and delivery mileage costs. The "curses of dimensionality" resulting from the MDP model’s large state and action spaces, the Bellman equation cannot be solved directly. Previous studies use statistical sampling methods or expert experience strategies for resolution, frequently with limited effectiveness and scalability. This thesis introduces an approach to improve fulfillment efficiency by combining optimization algorithms with a deep reinforcement learning method that takes long-term future benefits into account.Specifically, this thesis designs a dual-layer optimization algorithm under the framework of deep reinforcement learning. For the upper-level global dynamic problem, the deep reinforcement learning algorithm defines the specialized observation space and reward shaping for the agent and trains the model based on the proximal policy optimization algorithm for the decision of delaying dispatch or not. The global dynamic problem iteratively solves the lower-level subproblems, thus a cost-based insertion heuristic and the variable neighborhood search algorithm are designed for the lower-level subproblem, using heuristics to quickly obtain high-quality solutions for the single-round subproblems. In order to evaluate the effectiveness of the algorithm, this thesis performs numerical experiments on instances that fit real-world scenarios with various capacity scales, order sizes, and spatio-temporal types. For the single-round subproblem, the experiment verifies the convergence and robustness of the heuristic algorithm through result comparisons and visual analysis. For the global problem, this thesis compares multiple dynamic dispatch strategies, and the results demonstrate that the deep reinforcement learning combined with the optimization algorithm in this work can significantly reduce the total cost compared to other benchmark strategies, indicating the utility of delayed dispatch strategies. Finally, generalized experiments and sensitivity analysis on important parameters like order size are performed, verifying the model‘s generalization capability and deriving managerial insights.