R&D: HLN-Tree, Memory-efficient B+-Tree with Huge Leaf Nodes and Locality Predictors

ACM Transactions on Storage has published an article written by André Brinkmann, Reza Salkhordeh, Florian Wiegert, Institute of Computer Science, Johannes Gutenberg University Mainz, Mainz, Germany, Peng Wang, Yao Xin, Renhai Chen, Huang Keji, Huawei Technology, Hong Kong, China, and Gong Zhang, Huawei Technologies Co Ltd, Shenzhen, China.

Abstract: “Key-value stores in Cloud environments can contain more than 2⁴⁵ unique elements and be larger than 100 PByte. B⁺-Trees are well suited for these larger-than-memory datasets and seamlessly index data stored on thousands of secondary storage devices. Unfortunately, it is often uneconomical to even store all inner tree nodes in memory for these dataset sizes. Therefore, lookup performance is affected by the additional IOs for reading inner nodes.“

“This number of inner nodes can be reduced by increasing the size of leaf nodes. We propose HLN-Trees, which support huge leaf nodes without increasing the IO sizes for individual index operations. They partition leaf nodes in arrays of independent subnodes and combine ideas from BD-trees with rebalancing, learning key deviations, and storing locality predictors. HLN-Trees have been initially designed for uniform random key distributions and support arbitrary key distributions through an additional layer of hashing in leaf nodes. HLN-Trees decrease the number of inner nodes by up to 256× for uniform random key distributions and by 16× to 64× for arbitrary ones compared to B⁺-Trees, while keeping their performance at the same level even at high concurrency levels. We show analytically and through real-world and synthetic benchmarks that HLN-Trees also outperform state-of-the-art learned indexes for secondary storage.“