简体繁体 English

红黑与2-3-4树的实际性能，特别是考虑到缓存性能？

[英]Real-World Performance of Red-Black vs. 2-3-4 trees, especially considering cache performance?

原文 2019-06-01 06:49:10 1 1 data-structures/ red-black-tree/ 2-3-4-tree

A single node of a 2-3-4 tree could be constructed with 8 pointers: pointers to up to four child nodes, pointers to up to 3 actual records containing keys that will either match a search key or will determine which of 4 child nodes to recurse to, and a parent node pointer. 2-3-4树的单个节点可以用8个指针构造：指向最多四个子节点的指针，指向最多3个实际记录的指针，这些记录包含与搜索键匹配或将确定4个子节点中的哪一个的键递归到父节点指针。

Common hardware today has 8-byte pointers, giving a 64-byte node. 当今的通用硬件具有8字节指针，提供了64字节节点。 Further, modern CPUs have 64-byte cache lines. 此外，现代CPU具有64字节的高速缓存行。 Should the nodes be aligned with the cache lines, then each node requires only one cache line hit: after referring to the first of seven pointers, all the rest will be in your L1 cache. 如果节点与高速缓存行对齐，则每个节点只需要命中一个高速缓存行：在引用七个指针中的第一个指针之后，其余所有指针将位于L1高速缓存中。

While a red-black tree is far simpler to implement, and small code should be fast code, each level of descent in the tree risks an L1 cache miss. 虽然红黑树的实现要简单得多，小代码应该是快速代码，但是树中的每个下降级别都有可能发生L1高速缓存未命中的风险。 For 1023 objects, a 2-3-4 tree needs aa worst-case of 5 nodes to be loaded into cache. 对于1023个对象，2-3-4树需要将5个节点的最坏情况加载到缓存中。 A perfectly-balanced binary tree would need 10, but due to imbalance a Red-Black may need more (not sure the worst case: 20?) 完美平衡的二叉树需要10个树，但是由于不平衡，红黑树可能需要更多树（不确定最坏的情况：20个？）

Small test harnesses that simply hammer at one data structure will probably keep it all in cache, and so may report the Red-Black tree as being similar performance to the 2-3-4. 仅仅测试一种数据结构的小型测试工具可能会将其全部保留在缓存中，因此可能会报告红黑树的性能与2-3-4相似。 But I have a feeling that a complicated real-world application may see much less wall-clock time and lower latency with 2-3-4 trees. 但是我有一种感觉，一个复杂的实际应用程序可能会减少挂钟时间，并减少2-3-4树的延迟。

Is there any consensus or research on this? 是否对此有共识或研究？

1 个解决方案

Your reasoning is certainly correct -- for cold lookups the 2-3-4 tree will perform better just because it hits fewer cache lines. 您的推理当然是正确的-对于冷查找，2-3-4树的性能会更好，因为它命中的缓存行更少。

If the performance of the tree is important, though, that generally means that you're using it often . 但是，如果树的性能很重要，那通常意味着您经常使用它。

If the tree is being using often and it's not pretty much all in the cache, then it must be big. 如果树经常使用，并且缓存中的树不是全部，那么它一定很大。 When a big tree is used often, the higher-level nodes will generally be cached, because each level up is hit twice as often as the level below on average. 当经常使用一棵大树时，通常会缓存较高级别的节点，因为每个级别的命中率是平均级别以下的两倍。

So the real performance improvement when it matters is limited to the deepest few levels in the tree. 因此，真正重要的性能改进仅限于树中最深的几个层次。 You can still see a performance with with a 2-3-4 tree, but it's not a runaway, and I think you'd need a special reason to judge it worth the extra code complexity (especially in search and iteration). 您仍然可以看到使用2-3-4树的性能，但这并不是失控的，我认为您需要一个特殊的理由来判断它是否值得额外的代码复杂性（尤其是在搜索和迭代中）。