自适应流媒体视频传输是一项重要的网络应用,它通过动态选择码率档位提高用户观看体验。传统的启发式码率自适应算法由于固定参数与传统建模的限制,逐渐被数据驱动的智能码率自适应算法取代。该算法通过构建模型辅助传统算法决策或生成智能策略,显著改善了体验。然而,网络时变性、异质性与用户需求多样性等问题促使智能算法需依托闭环系统持续优化性能。因此,本文围绕高效训练模型、降低执行开销、快速适应环境和满足多样需求等四个方面,对自适应视频流智能传输的核心部分——服务端训练与客户端执行进行了优化,以改善各种场景下的用户体验并提升智能算法可行性。本文主要贡献如下:提出基于终身模仿学习高效训练的智能码率自适应优化算法。本文通过大规模数据测量发现了真实网络分布的缓慢变化特性对现有码率自适应算法的影响,并设计了内循环与外循环系统:利用模仿学习更快且有效地训练策略,同时采用终身学习方案选择有必要的网络场景实时迭代更新策略,适应网络总体分布变化。本文显著地提升了服务端的模型训练效率。 提出结合启发式算法机制的低开销智能码率自适应融合方案。本文针对智能码率自适应算法性能优越但执行开销较高的问题,融合启发式算法机制,使用神经网络为传统启发式算法提供参数,提升性能并降低整体开销。同时提出了轻量级触发器模块,通过降低算法决策频率进一步减少执行开销。本文同时考虑算法性能和整体开销,促进智能算法在用户端部署。 提出基于元强化学习的智能码率快速自适应方法。由于每个用户网络的异质性影响算法表现,本文设计了包含离线和在线阶段的训练系统。离线阶段,利用元强化学习生成元策略,使其具备感知当前网络环境的能力。在线阶段,结合回退策略安全训练元策略,通过少量观看快速提升算法在当前网络下的性能。本文通过异质性网络下快速优化策略,提升用户体验。提出基于自我对弈理论的需求驱动智能自适应视频传输算法。视频传输任务的需求由多个优先级相互制约的指标组成。传统的线性加权评价难以准确反映用户实际的非线性体验需求,导致优化后的算法做出相悖决策。本文基于自我对弈理论相同环境下重复采集轨迹,设计自我强化学习方法直接生成更贴近需求的策略,遂在多个视频传输任务中满足了多样化的用户需求。
Adaptive video streaming is an crucial network application that improves user viewing experience by dynamically selecting bitrate levels. Traditional heuristic Adaptive BitRate~(ABR) algorithms are gradually being replaced by data-driven learning-based intelligent ABR algorithms due to their fixed parameters and limitations of traditional modeling. Such algorithms significantly improve the users‘ quality of experience~(QoE) by assisting traditional algorithms in decision-making or generating learning-based strategies through model construction. Nevertheless, challenges such as network fluctuations, heterogeneity, and varied user requirements necessitate the use of learning-based algorithms to constantly enhance performance through a closed-loop system. Hence, the main objective of this dissertation is to enhance the fundamental aspects of learning-based adaptive video streaming, namely server-side training and client-side execution, from four different viewpoints: streamlined model training, decreased execution overhead, prompt adaptation to changing environments, and catering to diverse user requirements, which not only improves the user experience in various scenarios, but also enhances the feasibility of learning-based algorithms. We make the following contributions:We propose a learning-based intelligent ABR algorithm based on lifelong imitation learning for efficient training. Through large-scale data measurement, this dissertation reveals the effects of real-network dynamics on existing rate adaptation algorithms and proposes an inner-loop and outer-loop system: it leverages imitation learning to develop faster and more effective rate adaptation strategies and utilizes a lifelong learning scheme that selects relevant network scenarios for iteratively updating strategies instantly to accommodate overall network distribution changes. The proposed scheme significantly enhances training efficiency on the service side. We propose a low-overhead learning-based ABR fusion solution integrating heuristic algorithm mechanisms. This dissertation addresses the issue of high execution overhead in learning-based intelligent ABR algorithms while having superior performance by combining heuristic algorithm mechanisms and using neural networks to provide parameters for traditional heuristic algorithms to improve performance and reduce overall overhead. Meanwhile, a lightweight trigger module is proposed to further reduce overall costs by reducing the algorithm decision frequency. This work effectively considers algorithm performance and overall cost to promote its deployment on the user side.We propose a rapid-adaptive method for learning-based intelligent ABR algorithm based on meta-reinforcement learning. Due to the impact of network heterogeneity on algorithm performance, this dissertation designs a training system with offline and online stages. In the offline stage, meta-reinforcement learning is used to generate meta-policies with the ability to perceive the current network environment. In the online stage, the meta-policy is trained safely with a fallback strategy, and quickly improves the performance under the current network condition within few trials. This approach improves the user experience by quickly optimizing strategies in heterogeneous networks.We propose a requirement-driven learning-based intelligent video transmission algorithm via self-play learning. The requirement for video transmission tasks is composed of multiple priority-constrained indicators. Traditional linear weighted evaluation is difficult to accurately reflect users‘ actual nonlinear experiential requirements, leading to conflicting decisions made by optimized algorithms. This dissertation uses self-play theory to repeatedly collect trajectories in the same environment and designs a self-play reinforcement learning method to directly generate policies that are closer to the demands, thereby meeting diverse user requirements in various video transmission tasks.