登录 EN

添加临时用户

动态环境下一阶在线随机优化方法研究

First-Order Methods for Online Stochastic Optimization in Dynamic Environments

作者:周仕佶
  • 学号
    2020******
  • 学位
    博士
  • 电子邮箱
    zsj******.cn
  • 答辩日期
    2022.12.01
  • 导师
    朱文武
  • 学科名
    计算机科学与技术
  • 页码
    158
  • 保密级别
    公开
  • 培养单位
    600 清华-伯克利深圳学院
  • 中文关键词
    在线随机优化,动态后悔,多目标优化
  • 英文关键词
    Online Stochastic Optimization,Dynamic Regret,Multi-Objective Opti-mization

摘要

机器学习在诸多静态环境的应用中取得了巨大进展,这些应用通常假设模型是在相同的数据分布上进行训练和测试的。然而,环境的动态性是在实际环境中部署机器学习系统的重要挑战,即数据分布会随着时间的推移不断变化,使得训练好的模型逐渐失效。这个挑战广泛出现在动态定价、在线推荐、量化交易和自动驾驶等实际应用问题中。例如在在线推荐场景中,固定的推荐模型无法捕捉到不断变化的用户偏好,导致推荐效率随着时间显著下降。为解决动态环境的挑战,需要建立一个使得机器学习模型能够持续适应不断变化环境的在线学习机制。本文将上述目标建模为基于动态后悔的在线随机优化问题,它优化了在线学习过程中决策损失与最佳决策损失之间的累计差距。虽然其提供了严格的理论框架,但在很多应用中依然面临诸多实际挑战,例如样本标签不足、多目标、动态性未知等问题。而现有的框架依赖于标签数量,仅能处理单目标任务,且算法系数依赖关于环境动态性的先验。相关的理论框架和算法设计是此领域持续关注的开放问题。本文从以下角度推广了基于动态后悔的在线随机优化框架,提出了多种基于梯度的一阶方法设计,拓展了其在实际场景中的应用范围。首先,由于实时反馈在实践中通常不是完全监督的,我们构建了有限标签的在线随机优化框架,并提出了一种在线老师学生结构,使在线学习模型能够实现对无标签数据的自适应学习,大幅降低了其对标签数量的依赖性。其次,我们对上述框架进行了进一步拓展,通过结合主动学习技术更充分地利用数据分布,以实现更好的实际效果。然后,针对多目标场景,我们提出了多目标在线随机优化框架。我们严格证明了前人方法在随机环境下无法收敛到最优,并设计了基于动量的修正机制,修正了目前方法的收敛性,且不引入额外的计算和内存成本。最后,我们展示了在边缘缓存网络中应用基于动态后悔的在线随机优化算法的优势,并针对动态性未知问题设计了一种能够动态调整适应步长的机制,以实现在具有不同动态程度的环境下的一致提升。针对上述结果,我们均提供了严格的理论分析和大量的实验证实。我们的建模框架和理论分析具有足够的通用性,为未来的相关研究提供了基础。

Machine learning has achieved great success in many closed-world applications, where the models are typically trained and tested on the same data distribution. However, a crucial challenge of implementing machine learning systems in open-world applications is to deal with dynamic environments, where the data distributions continually change over time, gradually degrading the performance of the trained models. This challenge appears in a wide range of problems, such as dynamic pricing, online recommendation services, quantitative transaction, and self-driving systems. For example, in online recommendation scenarios, the trained model could not capture the changing user preferences, resulting in a significant decline in recommendation accuracy. It calls for an online learning system where machine learning models can continually adapt to changing environments.In this thesis, we formulate the above problem as online stochastic optimization with dynamic regret, which optimizes the accumulated gap between the losses of the online decisions with the losses of the best decisions. Although there are many classical works on online stochastic optimization based on dynamic regret, it still faces many practical challenges in multiple applications, such as problems with limited labels, multiple objectives, unknown environmental dynamics, etc. The existing frameworks depend on the number of labels, can only deal with single-objective settings, and the algorithms strongly depend on prior knowledge about the dynamics of the environment. The related theoretical framework and algorithmic design are open problems with constant concern in this field.This thesis generalizes the online stochastic optimization framework based on dynamic regret from the following perspectives, proposes several gradient-based first-order methods, and extends its applications in practical scenarios. First, as real-time feedback usually is not fully supervised in practice, we construct the framework of online stochastic optimization with limited labels, and propose an online teacher-student structure to enable self-adaptation from unlabeled data, improving the strong dependence on the number of labels in online learning. Second, we extend the framework in the first part with an empirical design, which utilizes the distribution of the received data with active learning techniques to achieve better practical performance. Then, to deal with scenarios with multiple objectives, we propose the multi-objective online stochastic optimization framework. In particular, we rigorously prove that previous methods cannot converge to the optimal in stochastic settings and design a fundamental correction based on momentum without introducing additional computation and memory overhead. Finally, we demonstrate the practical advantage of applying online stochastic optimization with dynamic regret in edge caching networks. We design an adaptive stepsize that adapts to the dynamic of the environments, achieving consistent improvements in different dynamic scenarios. All the above results are corroborated by both rigorous analysis and extensive experiments. Our framework and theoretical analysis are general enough to provide a basis for future research.