键值存储系统是支撑数据密集型应用的存储基础软件设施,其高性能、灵活易用的优势使其成为众多存储系统的后端存储引擎。对于高性能存储系统,内存在系统中具有重要的意义。然而,键值存储系统,尤其是以内存为主要存储介质的内存键值存储系统,在进一步发展的过程中面临着 DRAM 容量扩展受限的问题。高密度的持久性内存和以 RDMA 为代表的内存语义网络,为内存扩展和存储系统提供了新的机遇。然而,现有的键值存储系统,无法较好地适配新的扩展内存架构,难以充分发挥新硬件的性能优势。为此,本文重新思考键值存储系统在扩展内存下的高效设计,围绕键值存储系统的核心组件并结合硬件特性开展了研究:? 提出了面向持久性内存的日志结构键值存储系统垃圾回收机制 Pacman。Pacman 设计了一种批处理式的垃圾回收流水线,降低持久化开销,以及前后台协同的索引查找机制,降低索引查找开销;同时通过一种轻量级冷热分离机制降低持久性内存上的数据拷贝量。实验表明,Pacman 在高存储空间利用率下提高现有日志结构键值存储系统的吞吐率达 4.6 倍。? 提出了面向持久性内存的二级索引机制 Perseid。Perseid 设计持久性内存二级索引结构,将二级索引的数据项邻近组织,提高查询效率;惰性维护二级索引的一致性以保留系统的高写入性能,并引入一种轻量级的验证机制验证数据项的有效性。实验表明,Perseid 的查询性能相比于持久性内存索引提升达 7 倍,相比于日志结构合并树索引提升达 2 个数量级。? 提出了面向分离式内存的可扩展树状索引结构 Deft。Deft 重新设计索引结构的访问模式,通过简单的计算辅助寻址,减小远程访问的 I/O 放大;设计单边操作的共享排他锁,允许索引节点内的并发操作,提高并发性能;设计一种无额外开销的乐观同步机制提高读性能。实验表明,Deft 相比现有索引结构具有更高的扩展性,吞吐率提高达 9.5 倍。? 提出了面向分离式内存的计算端自适应缓存机制 Chord。Chord 缓存键值存储系统中的索引节点、键值对象地址、键值对象 3 类数据,并根据负载特征动态调整缓存容量比例,以充分发挥不同缓存的优势;设计基于版本号的自动淘汰、元数据分区等技术,降低缓存元数据管理和替换的开销。实验表明,Chord 相比于单一类型的缓存吞吐率提升达 1.8 倍。
Key-value stores are storage software infrastructure, supporting data-intensive applications. With the advantages of high performance and flexible usability, key-value stores have become the backend storage engine for many storage systems. For high-performance storage systems, memory plays a significant role. However, during further development, key-value stores, especially in-memory key-value stores that use memory as the main storage medium, are hindered by DRAM capacity scalability. High-density persistent memory and memory semantic networks such as RDMA provide new opportunities for memory expansion and storage systems. However, existing key-value stores fail to adapt well to these new extended memory architectures and cannot fully exploit the advantages of new hardware. Therefore, this dissertation rethinks how to design efficient key-value stores with extended memory, and conducts research on the core components of key-value stores:? A persistent memory-friendly garbage collection mechanism for log-structured key-value stores, Pacman, is proposed. Pacman introduces a batched garbage collection pipeline to reduce persistence overhead. It also incorporates a collaborative index lookup mechanism between foreground and background threads to reduce indexing overhead. Additionally, a lightweight hot-cold data separation mechanism is proposed to reduce data copying on persistent memory. Experiments show that Pacman improves throughput by up to 4.6× compared to existing log-structured key-value stores under high space utilization.? A secondary index mechanism based on persistent memory, Perseid, is proposed. Perseid designs a persistent memory secondary index structure that organizes data entries in a contiguous manner to improve query efficiency. It lazily maintains the consistency of the secondary index to retain high write performance of the storage system, and introduces a lightweight validation mechanism to verify the validity of data entries. Experiments show that Perseid improves query performance by up to 7× compared to persistent memory indexes and by 2 orders of magnitude compared to LSM-tree-based indexes.? A scalable tree-based index structure for disaggregated memory, Deft, is proposed. Deft redesigns the access pattern of the index structure with simple computation-assisted addressing to reduce remote access I/O amplification. It incorporates shared-exclusive locks that use one-sided operations to enable concurrent operations within index nodes, improving concurrency performance. Additionally, it devises an optimistic synchronization mechanism without additional overhead to improve read performance. Experiments show that Deft exhibits higher scalability compared to the state-of-the-art index structures, achieving throughput improvement of up to 9.5×.? A compute-sided adaptive cache mechanism for disaggregated memory, Chord, is proposed. Chord caches three types of data in key-value stores, including index nodes, object addresses, and key-value objects. It dynamically adjusts the cache capacity ratio based on workload characteristics to fully exploit the advantages of different caches. It incorporates techniques such as version-based automatic eviction and metadata partitioning to reduce the overhead of cache metadata management and replacement. Experiments show that Chord improves throughput by up to 1.8× compared to single-type caching mechanisms.