近年来,随着互联网用户数量的持续增长,高度依赖人工智能且计算密集型的 推荐系统的计算资源消耗显著增加,同时随着宏观经济与环境因素影响,大规模的 算力需求供给也遭遇到了瓶颈,算力效能提升的重要性在推荐系统的流量控制领 域凸显。其中从离线角度去优化计算资源的分配受到了业界的广泛关注,其可以 预先的对流量进行预测与资源的分配,减少调控的滞后性,以直接提高这些平台 的运营效率和盈利能力。而对于推荐系统的流量控制中的离线算力分配问题,现 有的建模方法都比较朴素,且建立在高度假设之上,在离线侧分配的结果和真实 的流量收益往往有较大的差距。因此对流量控制中离线模块中,更符合实际的建 模及分配方式的探索迫在眉睫。本文基于支付宝的真实数据,研究内容主要如下:首先,在分析广告推荐系统的真实数据时,我们发现目前的方法主要依赖基于MCKP的数学建模的解决方案,这些方法在很大程度上依赖于参数估计的准确性(例如,预期的广告收入,如 eCPM)以及更理想化的假设,不足以捕捉回报的复杂原理。因此,我们提出了一个基于 Transformer的端到端分配方案预测算法,该算法不依赖于实际参数并且基于一个黑箱收入函数,旨在学习我们所拥有的估计信息与实际最优决策之间的直接映射。在支付宝的广告推荐系统数据上进行的实验已经证明,我们的方法比传统模型提高了超过36.7%。 然后,我们对广告流量与系统反馈数据进行了关系分析,拆解了从算力决策到最终流量产生收益的整条链路,依次分析并揭示了算力分配决策,算力消耗,系统 负载,响应时间比率以及曝光率之间的关系,并利用真实数据进行建模与关系拟合。最后,我们聚焦推荐系统流量控制架构进行了深层次分析,并结合前文基于数据的分析与关系建模,拓展了问题,提出了完整的考虑曝光率的决策模型,并与容量规划任务合并建立了两阶段的容量规划与算力分配的联合优化模型,对模型进行了等价线性化模型重构。在数值试验部分,分别与算力分配的固定容量模型,以及传统的考虑响应时间作为直接约束的业界算力分配模型进行了对比,分别实现收益 3.1% 和 1.9% 的提升,且可以实现响应时间与曝光率指标的灵活变动,证明了我们的模型的有效性,为运营方提供了一种全面分析和优化推荐系统效率,提升收益的方法论。
In recent years, with the continuous growth of internet users, the computation resource consumption of highly AI-dependent and computationally intensive recommendation systems significantly increased. At the same time, due to the impact of macroeconomic and environmental factors, the supply of large-scale computation resources encounters bottlenecks, highlighting the importance of improving computation efficiency in the traffic control domain of recommendation systems. Therefore, optimizing the allocation of computation resources from an offline perspective attracts widespread attention in the industry. It allows for the advanced prediction of traffic and the allocation of resources, reducing the latency of regulation and directly enhancing the operational efficiency and profitability of these platforms. Current approaches to offline computation resource allocation in the traffic control problem for recommendation systems are heavily reliant on assumptions, leading to a significant discrepancy between the allocation results and actual traffic revenue. Therefore, there is an urgent need for exploration into more realistic modeling and allocation methods within the offline modules of traffic control. This thesis, based on real data from company A, focuses on the following research areas:Firstly, upon analyzing the real data from an advertising recommendation system, it is discovered that current methodologies predominantly depend on Multiple-choice Knapsack Problem mathematical modeling solutions. These methods depend significantly on the precise estimation of parameters (such as expected advertising revenue, like eCPM) and operate under more idealistic assumptions, which fail to fully represent the intricate dynamics of returns. As a result, a transformer-based end-to-end allocation scheme prediction algorithm is introduced that operates independently of real parameters and leverages a black-box revenue function. This framework aims to learn a direct correlation between the estimated data we possess and the optimal real-world decisions. Testing this approach with data from company A‘s advertising recommendation system shows that our model significantly outperforms traditional models by over 36.7%.Next, relational analysis between advertising traffic and system feedback data is conducted, dissecting the entire chain from computation resource decisions to the generation of revenue from traffic. The relationships between computation resource allocation decisions, computation resource consumption, system load, response time ratios, and exposure rates are sequentially analyzed and revealed. By utilizing real data, modeling and fitting these relationships engage in gaining a deeper understanding of how each element influences the overall system performance.Finally, we focus on the analysis of the traffic control architecture in recommendation systems. Combining the data-based analysis and modeling discussed earlier, we extend the problem, propose a comprehensive decision-making model that takes into account the exposure rate, and integrate it with capacity planning tasks to establish a two-stage joint optimization model. We reformulate the model into an equivalent linearized model. In the numerical experiments section, the proposed model is compared with both the CRAP model with fixed capacity for computation resource allocation and CRA-RT model that considers response time as a direct constraint. Our model achieves revenue increases of 3.1% and 1.9%, respectively, which can also achieve flexible adjustment of response time and exposure rate metrics, demonstrating its effectiveness.