简体繁体 English

B+Tree 中的数据应该排序吗？我应该一次加载多少数据？

[英]Should data in B+Tree be ordered and how much data should I be loading at a time?

原文 2021-11-13 16:52:19 5 1 data-structures/ tree/ b-tree/ b-plus-tree

Within my B+Tree I have ordered keys at the leaf levels, with data pointers to a separate data file.在我的 B+Tree 中，我在叶级别对键进行了排序，并带有指向单独数据文件的数据指针。 This question refers to the order of data within this data file.这个问题指的是这个数据文件中的数据顺序。

As I see it, When ordered:正如我所见，订购时：

It's Easier to load all the data in a block when one is read, as data is stored in the same order as its keys, so you only have to check if adjacent data pointers are in the same block.读取时将所有数据加载到块中更容易，因为数据与其键的存储顺序相同，因此您只需检查相邻数据指针是否在同一块中。
Less reads when accessing a lot of adjacent data or performing ranges, as data is more likely to be contained in the same block.访问大量相邻数据或执行范围时读取较少，因为数据更有可能包含在同一块中。
Increased fragmentation and writes when deleting/splitting/inserting删除/拆分/插入时增加碎片和写入

when not ordered:未订购时：

Decreased writes when inserting, as data is just appended to the end of the last block associated with the node.插入时减少写入，因为数据只是附加到与节点关联的最后一个块的末尾。
Increased reads when performing ranges, as data is less likely to be split between multiple blocks.执行范围时增加读取，因为数据不太可能在多个块之间拆分。
It's a lot slower to find the entry that other data belongs to within the same block, as you have to loop through all the entries in the nodes checking their data pointers.在同一块中找到其他数据所属的条目要慢得多，因为您必须遍历节点中的所有条目以检查它们的数据指针。

Alternatively, should I just load entire nodes into memory when I need to access data from an entry within that node?或者，当我需要从该节点内的条目访问数据时，是否应该将整个节点加载到内存中？

Looking for a second opionion on the best way I should be storing data (ordered/unordered) and how many data pointers should I be loading when performing a simple "get" for one value?寻找关于我应该存储数据（有序/无序）的最佳方式的第二个选项，以及在对一个值执行简单的“获取”时应该加载多少个数据指针？

Thanks!谢谢！

1 个解决方案

Alternatively, should I just load entire nodes into memory when I need to access data from an entry within that node?或者，当我需要从该节点内的条目访问数据时，是否应该将整个节点加载到内存中？

Absolutely.绝对地。 That is the idea behind the B-tree family of data structures.这就是 B 树数据结构家族背后的想法。 It is not intended for reading/writing partial nodes (blocks) from/to slow storage.它不适用于从/向慢速存储读取/写入部分节点（块）。 When a node is needed, it is read in its entirety into memory (if not already loaded), manipulated, and written back entirely to persist it.当需要一个节点时，它会被完整地读入内存（如果尚未加载）、操作并完全写回以保持它。

As the manipulation of the data in the node itself happens in memory, the choice of whether to keep the node's content sorted or not is of less importance.由于节点本身中的数据操作发生在内存中，因此选择是否保持节点内容排序变得不那么重要了。 The read/write operations to the slow(er) storage will be much more determining for the overall performance.对较慢（更）存储的读/写操作将更多地决定整体性能。

Making considerations of the time complexity of either choice are irrelevant too, since there is a preset maximum to the number of keys in a node.考虑任一选择的时间复杂度也无关紧要，因为节点中的键数有一个预设的最大值。 So all single-node operations can be considered to have a constant time complexity.所以所有的单节点操作都可以被认为具有恒定的时间复杂度。

You could time the different implementations and see if that leads to a clear choice.您可以对不同的实现进行计时，看看这是否会导致明确的选择。 This choice will depend on the degree of the B-tree.这个选择将取决于 B 树的程度。