最优输运问题是近年来数学领域的热点问题,其通过最小化测度之间的传输代价给出了概率测度空间上信息丰富的Wasserstein距离度量以及合理的几何对应关系。人们通过利用最优输运的工具在各个领域实现了突破,如机器学习,反问题,信号处理等。随着相关技术的发展,计算大规模最优输运问题的需求与日俱增,发展一套低复杂度的求解算法如今已经是最优输运领域的主流研究方向之一。 在算法的层面,针对欧拉离散框架下,矩形网格上的Wasserstein-1距离计算,我们提出了一套针对不同应用需求的快速算法Fast Sinkhorn。对于均匀网格的情况,我们提出Fast Sinkhorn I算法,将Sinkhorn迭代中核矩阵与向量的乘法的时空复杂度降低至线性的$O\brac{N}$;进一步的,我们通过定义并研究矩阵的群结构,提出了Fast Sinkhorn II算法,利用Sinkhorn迭代以线性复杂度操作快速收敛到原始最优输运问题最优解;最后,在非均匀网格下,我们利用序关系对核矩阵进行了划分,实现了在非均匀网格下算法的扩展,发展了非均匀网格上的Fast Sinkhorn算法。相较于原始算法,我们提出的Fast Sinkhorn算法在时间和空间复杂度上都具有显著优势,大大降低了求解最优输运问题的耗时和内存占用,扩展了最有输运模型在大规模场景下的应用。 基于Fast Sinkhorn算法,我们提出了用于解决机器学习任务的两类应用。首先我们提出了适用于多模态任务的最优输运距离——多维分片Wasserstein距离,并基于Fast Sinkhorn算法设计了快速计算流程,保证了其在大规模场景下的实际应用。我们提出的第二个应用是条件Wasserstein差异,这是一个无模型的特征重要性计算方法,避免模型选择和训练的同时保证了对特征之间的交互作用信息的获取,同时其天然的多模态性使得利用Fast Sinkhorn算法和多维分片Wasserstein距离进行加速成为了可能。
The Optimal Transport problem has been a hot topic in the mathematics field in recent years. The Optimal Transport problem provides the informative Wasserstein distance on the space of probability measure as well as reasonable geometric correspondences by minimizing the transport cost between measures. Breakthroughs have been achieved in various fields by utilizing the Optimal Transport tools, such as machine learning, inverse problems, signal processing, etc. With the development of related technologies, the demand for computing large-scale Optimal Transport problems is increasing day by day, and developing low-complexity algorithms to solve such problems has become one of the mainstream research directions in the optimal transportation field. At the algorithmic level, for computing the Wasserstein-1 distance on rectangular meshes under the Eulerian discretization framework, we propose a set of fast algorithms called Fast Sinkhorn tailored for different application needs. For uniform meshes, we propose the Fast Sinkhorn I algorithm, reducing the time and space complexity of matrix-vector multiplications in the Sinkhorn iterations to linear O(N); furthermore, by defining and studying the group structure of special matrices, we propose the Fast Sinkhorn II algorithm, which utilizes the Sinkhorn iterations to converges rapidly to the solution of the original Optimal Transport problem with O(N) complexity operations; finally, for non-uniform meshes, we partition the kernel matrix according to the order relation to extend the algorithm, developing Fast Sinkhorn algorithms on non-uniform meshes. Compared to the original algorithm, our proposed Fast Sinkhorn algorithms have significant advantages in time and space complexity, greatly reducing the time cost and memory usage for solving Optimal Transport problems, expanding the application of Optimal Transport models to large-scale scenarios. Based on the Fast Sinkhorn algorithm, we propose two methods for solving machine learning tasks. Firstly, we propose an Optimal Transport distance for multimodal tasks - the Multidimensional Sliced Wasserstein distance, and design a fast computation procedure based on the Fast Sinkhorn algorithm to ensure its practical application in large-scale scenarios. The second method we propose is the Conditional Wasserstein Discrepancy, a model-free method evaluating the feature importance that avoids model selection and training while accounting for the interaction information between features. Meanwhile, its natural multimodality enables the usage of the Fast Sinkhorn algorithm and the Multidimensional Sliced Wasserstein distance for acceleration.