简体   繁体   English

在叶子节点的所有键数据都在内存中的情况下,B + tree搜索的性能是否比Binary Search Tree搜索更好?

[英]Can B+tree search perform better than Binary Search Tree search where all keys-data of the leaf nodes are in the memory?

Assume that we are implementing a B+ tree in memory, keys are at the internal nodes and key-data pairs are in the leaf nodes. 假设我们正在内存中实现一个B +树,键在内部节点上,而键数据对在叶节点上。 If B+tree with a fan-out f, this means that B+ tree will have a height of log_f N where N is the number of keys, whereas the corresponding BST will have height of log_2 N. If we are not doing any disk reads and writes, can B+tree search performance be better than Binary Search Tree search performance? 如果B + tree的扇出为f,则表示B +树的高度为log_f N,其中N为键的数量,而相应的BST的高度为log_2N。如果我们不进行任何磁盘读取并写入,B + tree搜索性能能否优于Binary Search Tree搜索性能? How? 怎么样? Since for B+tree at each internal node we have make a decision on F many choices instead if 1 for BST? 由于对于每个内部节点的B + tree,我们对F做出了很多选择,而对于BST,是否选择1?

At least when compared to cache, main memory has many of the same characteristics as a disk drive--it has fairly high bandwidth, but much higher latency than cache. 至少与高速缓存相比,主内存具有许多与磁盘驱动器相同的特性-它具有相当高的带宽,但延迟比高速缓存高得多。 It has a fairly large minimum read size, and gives substantially higher bandwidth when reads are predictable (eg, when you read a number a number of cache lines at contiguous addresses). 它具有相当大的最小读取大小,并且在可预测的读取时(例如,当您读取多个连续地址处的多个高速缓存行时),其带宽要大得多。 As such, it benefits from the same general kinds of optimizations (though the details often vary a bit). 因此,它受益于相同的一般优化(尽管细节经常有所不同)。

B-trees (and variants like B* and B+ trees) were explicitly designed to work well with the access patterns supported well by disk drives. B树(以及B *和B +树之类的变体)经过明确设计,可以与磁盘驱动器很好地支持的访问模式一起使用。 Since you have to read a fairly substantial amount of data anyway, you might as well pack the data to maximize the amount you accomplish from the memory you have to read. 由于无论如何都必须读取相当大量的数据,因此最好将数据打包以最大程度地从必须读取的内存中完成数据。 In both cases, you also frequently get a substantial bandwidth gain by reading some multiple of the minimum read in a predictable pattern (especially, a number of successive reads at successive addresses). 在这两种情况下,通过以可预测的模式读取最小读取数的某些倍数(尤其是在连续地址处的多个连续读取数),您也经常会获得可观的带宽增益。 As such, it often makes sense to increase the size of a single page to something even larger than the minimum you can read at once. 因此,将单个页面的大小增加到甚至大于一次读取的最小大小通常很有意义。

Likewise, in both cases we can plan on descending through a number of layers of nodes in the tree before we find the data we really care about. 同样,在这两种情况下,我们都可以计划在找到我们真正关心的数据之前,通过树中的多个节点层次进行下降。 Much like when reading from disk, we benefit from maximizing the density of keys in the data we read, until we've actually found the data we care about. 就像从磁盘读取数据一样,我们可以从最大化读取数据中密钥的密度中受益,直到我们真正找到了我们关心的数据为止。 With a typical binary tree: 使用典型的二叉树:

template <class T, class U>
struct node {
    T key;
    U data;
    node *left;
    node *right;
};

...we end up reading a number of data items for which we have no real use. ...我们最终读取了一些我们没有实际用途的数据项。 It's only when we've found the right key that we need/want to get the associated data. 仅当我们找到了需要/想要获取关联数据的正确密钥时。 In fairness, we can do that with a binary tree as well, with only a fairly minor modification to the node structure: 公平地说,我们也可以使用二叉树来做到这一点,只需对节点结构进行相当小的修改即可:

template <class T, class U>
struct node {
    T key;
    U    *data;
    node *left;
    node *right;
};

Now the node contains only a pointer to the data rather than the data itself. 现在,该节点仅包含指向数据的指针,而不包含数据本身。 This won't accomplish anything if data is small, but can accomplish a great deal if it's large. 如果data很小,这将无法完成任何工作,但是如果data很大,则可以完成很多工作。

Summary: from the viewpoint of the CPU, reads from main memory have the same basic characteristics as reads from disk; 简介:从CPU的角度来看,从主存储器中进行读取与从磁盘中进行读取具有相同的基本特征; a disk just shows a more extreme version of those same characteristics. 磁盘只是显示了这些特性的更极端的版本。 As such, most of the design considerations that led to the design of B-trees (and variants) now apply similarly to data stored in main memory. 因此,导致B树(和变体)设计的大多数设计注意事项现在都类似地应用于存储在主存储器中的数据。

B-trees work well and often provide substantial benefits when used for in-memory storage. B树很好用,并且在用于内存中存储时通常会带来很多好处。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM