近似最近邻搜索(ANNS)作为大数据处理领域的重要技术,在含有海量数据的数据库中发挥着不可或缺的作用。它通过快速有效地从高维向量集中检索出与给定查询向量相似的对象,在信息检索、推荐系统、机器学习等多个领域提供了关键支持。然而,随着数据规模的日益扩大和维度的不断增加,ANNS算法面临着一系列挑战。在进行相似性搜索时,ANNS算法通常需要频繁地访问内存来读取和比较大量的高维向量数据。这种频繁的内存访问不仅导致了显著的延迟问题,还增加了搜索的能耗,严重影响了算法的性能和用户体验。基于DRAM的存内计算(PIM)架构因其新颖的计算范式而备受关注,它通过直接在内存中执行计算任务来减少数据的传输和移动,在内存密集型应用中能够显著提高能效并降低延迟。ANNS作为典型的内存密集型应用,很适合采用这种计算范式进行加速,但现有PIM设计并未专门针对ANNS算法进行优化,直接将其应用于搜索场景可能会带来更多的挑战。因此,我们针对搜索算法的特征进行分析,提出了一种基于商用DRAM的轻量级增量设计,通过专用设计实现快速的存内近似最近邻搜索。该架构充分利用了DRAM的高带宽和低延迟特性,通过直接在内存中执行搜索算法的关键步骤,显著提高了搜索速度。同时,我们还设计了一套专用的搜索命令集以及对内存子系统接口进行了修改,以确保主机与DRAM之间的顺畅通信和高效协作。为了验证设计架构的性能和效果,我们对现有的通用DRAM系统模拟器进行了优化,并与数字电路后端仿真相结合,搭建了一个完整的系统仿真平台。通过仿真实验,我们发现与传统架构上的的ANNS搜索相比,我们的设计实现了高达5.8倍的性能提升。此外,与具有类似性能水平的算法相比,该架构还具有更高的召回率,能够更准确地找到与查询向量相似的对象,从而提高了搜索结果的准确性和可靠性。
Approximate Nearest Neighbor Search (ANNS), as a significant technology in thefield of big data processing, plays an indispensable role in databases containing vast amounts of data. It provides critical support in various domains such as information retrieval, recommendation systems, and machine learning by quickly and effectively retrieving objects similar to a given query vector from a collection of high-dimensional vectors.However, with the ever-expanding scale of data and increasing dimensions, ANNS algorithms face several challenges. Similarity searches typically require frequent memory access to read and compare large amounts of high-dimensional vector data. This frequent memory access not only leads to significant energy e?iciency issues but also increases the latency overhead of searches, severely affecting the performance and user experience of the algorithm.The Processing-in-Memory (PIM) architecture based on DRAM has garnered significant attention due to its innovative computing paradigm, which reduces data transmission and movement by executing compute tasks directly within memory. This approach significantly enhances energy e?iciency and reduces latency in memory-intensive applications.As a typical memory-intensive application, ANNS algorithms are well-suited for acceleration through this computing paradigm. However, existing PIM designs have not been specifically optimized for ANNS algorithms, and applying them directly to search scenarios may introduce additional challenges. Therefore, we analyze the characteristics of search algorithms and propose a lightweight, incremental design based on commercial DRAM, achieving rapid in-memory ANNS through a dedicated design. This architecture fully leverages the high bandwidth and low latency of DRAM by executing critical steps of the search algorithm directly within memory, significantly increasing search speed.Additionally, we have designed a specialized set of search commands and modified the memory subsystem interface to ensure smooth communication and e?icient collaboration between the host and DRAM. To verify the performance and effectiveness of the design architecture, we modified an existing general-purpose DRAM system simulator and combined it with digital circuit backend simulation to establish a complete system simulation platform. Simulation experiments revealed that compared to ANNS searches on traditional architectures, our design achieved up to a 5.8x increase in performance. This means that the proposed architecture can complete a large number of similarity search tasks in a shorter time, improving search e?iciency. Moreover, compared to algorithms with similar performance levels, our architecture also has a higher recall rate, enabling more accurate identification of objects similar to the query vector, thereby enhancing the accuracy and reliability of search results.