简体繁体 English

B树节点中的链表是否优于数组？

[英]Is a linked list in a B-tree node superior to an array?

原文 2014-05-27 06:09:02 9 3 algorithm/ data-structures/ b-tree

I want to implement a B-tree index for my database. 我想为我的数据库实现B树索引。

I have read many data structure and algorithm books to learn how to do it. 我已经阅读了许多数据结构和算法书籍来学习如何做到这一点。 All implementations use an array to save data and child indexes. 所有实现都使用数组来保存数据和子索引。

Now I want to know: is a linked list in B-tree node superior to an array? 现在我想知道：B树节点中的链表是否优于数组？ There are some ideas I've thought about: 我想到了一些想法：

when splitting a node, the copy operation will be more quickly than with an array. 拆分节点时，复制操作将比使用数组更快。
when inserting data, if the data is inserted into the middle or at the head of the array, the speed is lower than inserting to the linked list. 插入数据时，如果数据插入到数组的中间或头部，则速度低于插入链接列表的速度。

3 个解决方案

If the BTree is itself stored on the disk then a linked list will make it very complicated to maintain. 如果BTree本身存储在磁盘上，那么链表将使维护变得非常复杂。

Keep the B-Tree structure compact. 保持B树结构紧凑。 This will allow more nodes per page, locality of data and allowing caching of more nodes, and fewer disk reads/cache misses. 这将允许每页更多节点，数据的位置以及允许更多节点的缓存，以及更少的磁盘读取/缓存未命中。

Use an array. 使用数组。

The perceived in-memory computational benefits are inconsequential. 感知的内存计算优势是无关紧要的。

So, in short, no, linked list is not superior. 所以，简而言之，不，链表并不优越。

The linked list is not better, in fact a simple array is not better either (except its simplicity which is good argument for it and search speed if sorted). 链接列表并不是更好，实际上一个简单的数组也不是更好（除了它的简单性，它是它的好参数和搜索速度，如果排序）。

You have to realize that the "array" implementation is more a "reference" implementation than a true full power implementation. 您必须意识到“阵列”实现更像是“参考”实现而不是真正的全功率实现。 For example, the implementation of the data/key pairs inside a B-Tree node in commercial implementations uses many strategies to solve two problems: storage efficiency and efficient search of keys in the node. 例如，商业实现中B树节点内的数据/密钥对的实现使用许多策略来解决两个问题：存储效率和节点中密钥的有效搜索。

With regard with efficient search, an array of key/value with an internal balanced tree structure on the top of it can make insertion/deletion/search be done in O(log N), for large B tree nodes it makes sense. 关于有效搜索，在其顶部具有内部平衡树结构的键/值数组可以使得在O（log N）中完成插入/删除/搜索，对于大B树节点它是有意义的。

With regard to memory efficiency, the nature of data in the key and value is very important. 关于内存效率，密钥和值中数据的性质非常重要。 For example, lexicographical keys can be shorten by a common start (eg "good", "great" have "g" in common), the data might be compressed as well using any possible scheme relevant to the nature of the data. 例如，字典键可以通过共同的开始缩短（例如“好”，“好”具有共同的“g”），也可以使用与数据的性质相关的任何可能的方案来压缩数据。 The compression of keys is more complex as you will want to keep this lexicographical property. 键的压缩更复杂，因为您需要保留此词典属性。 Remember that the more data and keys you stuff in a node, the fastest are the disk accesses. 请记住，您在节点中填充的数据和密钥越多，最快的是磁盘访问。

The time to split a node is only partially relevant, as it will be much less than the time to read or write a node on typical media by several order of magnitude. 分割节点的时间只是部分相关，因为它比在典型介质上读取或写入节点几个数量级的时间要少得多。 On SSD and extremely fast disks (by 10 to 20 years it is expected to have disks as fast as RAM), many researches are conducted to find a successor to B-Trees, stratified B-Trees are an example. 在SSD和速度极快的磁盘上（预计10到20年内它的磁盘速度和RAM一样快），许多研究都是为了找到B-Trees的后继者，分层的B-Trees就是一个例子。

B-tree is typically used in DBs where the data is stored on disks and you want to minimize the number of blocks you want to read. B树通常用于DB中，其中数据存储在磁盘上，并且您希望最小化要读取的块数。 I do not think your proposal would be efficient in that case (although it might be beneficial if you can load all data into RAM). 在这种情况下，我认为您的提案不会有效（尽管如果您可以将所有数据加载到RAM中可能会有所帮助）。

If you want to perform those two operations effectively you should use a Skip List ( http://en.wikipedia.org/wiki/Skip_list ). 如果要有效地执行这两个操作，则应使用跳过列表（ http://en.wikipedia.org/wiki/Skip_list ）。 Performance-wise it will be similar to what you have outlined. 在性能方面，它将与您概述的类似。