登录 EN

添加临时用户

面向即时配送路径规划与订单分配的智能优化决策

Intelligent Optimization and Decision-Making for On-Demand Delivery Oriented Route Planning and Order Allocation

作者:王兴
  • 学号
    2017******
  • 学位
    博士
  • 电子邮箱
    wan******.cn
  • 答辩日期
    2023.05.09
  • 导师
    王凌
  • 学科名
    控制科学与工程
  • 页码
    163
  • 保密级别
    公开
  • 培养单位
    025 自动化系
  • 中文关键词
    即时配送,路径规划,订单分配,机器学习,优化决策
  • 英文关键词
    on-demand delivery,route planning,order allocation,machine learning,optimization and decision-making

摘要

作为新兴的物流服务形态,即时配送具有大规模、高时效、不确定和强动态的复杂特性,高效的优化与决策方法有助于提升配送效率、改善用户体验。本学位论文针对即时配送场景下的路径规划与订单分配问题,设计有效的启发式规则和策略,利用海量数据构建机器学习任务和模型,提出运筹优化与机器学习协同的智能优化决策方法,提升即时配送服务的效率和体验。 论文综述了即时配送路径规划与订单分配问题以及基于机器学习的调度优化研究进展,通过深入研究取得了以下成果: (1)针对即时配送场景下的骑手路径规划问题,设计了插入式路径构造算法和不同的订单排序规则,建立了基于机器学习的排序规则自适应选择机制,进而提出了启发式规则与机器学习协同的骑手路径规划方法。 (2)针对专送场景下的订单指派问题,建立了用以组合相似订单的机器学习模型,并采用改进的Kuhn-Munkres(KM)算法实现订单和骑手的快速匹配,进而提出了基于订单组合与KM匹配的两阶段订单指派方法。 (3)针对快送场景下的订单推荐问题,考虑骑手抢单的不确定性,建立了预估骑手抢单意愿的机器学习模型,并基于问题性质设计了启发式订单召回排序算法和局部搜索操作,进而提出了“预测+优化”的分层求解框架。 (4)针对动态场景下的订单推荐问题,考虑订单和骑手信息的动态变化,基于Actor-critic网络设计了强化学习订单推荐框架,并基于注意力机制对骑手的反馈信息进行辨识和挖掘,进而提出了基于深度强化学习和反馈信息辨别的订单推荐框架。 (5)针对多骑手场景下的订单推荐问题,考虑骑手间的抢单冲突,在强化学习框架内基于时间序列预测模型实现“编码器+解码器”的Actor网络结构,并设计了骑手顺序生成规则确定Actor输出与不同骑手的对应关系,进而提出了基于深度强化学习和时间序列预测的订单推荐框架。 论文分析了所提问题和方法的复杂度,利用真实配送数据开展了大量数值实验与统计比较,验证了所提方法的有效性,部分方法已在美团配送平台上全国推广应用。

As an emerging logistics service type, on-demand delivery has complex characteristics such as large scale, high timeliness, uncertainty, and strong dynamics. Effective optimization and decision-making methods can help improve delivery efficiency and optimize user experience. This dissertation addresses the route planning and order allocation problem in the scenario of on-demand delivery, designs effective heuristic rules and strategies, and establishes different machine learning tasks and models based on massive amount of data, leading to an intelligent optimization and decision-making methodology that collaborates machine learning and operational research optimization, which can improve the efficiency and experience of on-demand delivery service. On the basis of reviewing the researches of route planning and order allocation in on-demand delivery, as well as machine learning methods for solving scheduling optimization problems, this dissertation has achieved the following results through in-depth research: (1) For the route planning problem, an insertion-based route construction algorithm and different sequencing rules are designed. Besides, an adaptive selection mechanism based on machine learning is established. Hence, an efficient route planning method with heuristic rules and machine learning is proposed. (2) For the order assignment problem in the scenario of professional delivery, a machine learning model is established to batch similar orders, and an improved Kuhn-Munkres algorithm is designed to quickly match orders with riders. Hence, a two-stage order assignment method with order batching and Kuhn-Munkres matching procedure is proposed. (3) For the order recommendation problem in the scenario of crowdsourcing delivery, a machine learning model is established to estimate the rider’s willingness of grabbing orders, considering the uncertainty of riders’ order grabbing. Besides, a heuristic-based order allocation and sequencing algorithm is designed, together with several local search operations to improve the solution. Thus, a “prediction-optimization” hierarchical solution framework is proposed. (4) For the dynamic order recommendation problem, a reinforcement learning order recommendation framework is designed based on Actor-critic network, considering the continuously-changing information of orders and riders. A feedback correlation network is designed based on attention mechanism to identify and excavate different feedback information. Thus, a deep reinforcement learning order recommendation framework with feedback information identification is proposed. (5) For the order recommendation problem with multiple riders, the order grabbing conflicts are naturally considered. An “encoder-decoder” actor network is established based on reinforcement learning and time series prediction. Besides, rider sequence generation rules are designed to match the output sequence of actor network with different riders. Hence, a deep reinforcement learning order recommendation framework with time series prediction model is proposed. The computational complexities of the problems and methods are analyzed. Besides, large amounts of experiments and statistical comparisons are conducted based on the data from real delivery platforms, demonstrating the effectiveness of the proposed methods. Some methods in this dissertation have already been successfully applied on Meituan delivery platform in nationwide range.