简体   繁体   English

如何从给定列表有效地构造B +树?

[英]How to efficiently construct a B+ tree from a given list?

I want to build a B+ tree from a given list of unordered elements of size N . 我想从给定大小为N的无序元素列表中构建B +树。

I know that the optimal bound to do it is Θ(N / B * logM / B(N / B)) block transfers, which is also the optimal for sorting; 我知道这样做的最佳界限是Θ(N / B * logM / B(N / B))块传输,这也是排序的最佳选择; so I can't simply pick an item and do an insert in the tree individually, since it would give me O(N logB(N)) block transfers. 所以我不能简单地选择一个项目并单独在树中插入,因为它会给我O(N logB(N))块传输。

So I figured that the best way to build the tree is to sort the elements first, since the leaves are ordered anyway. 所以我认为构建树的最佳方法是首先对元素进行排序,因为无论如何都要对树进行排序。 From that, I'm at a loss. 从那以后,我很茫然。

I thought about something like this: 我想过这样的事情:

  1. Take B elements from the list 从列表中取出B元素
  2. Write them as they are somewhere (it's a leaf of the three) 把它们写在某个地方(这是三个叶子)
  3. Take the last element of the block (the biggest); 采取块的最后一个元素(最大的); it will be a routing key for the parent of the leaf 它将是叶子父级的路由键
  4. Repeat Step 1 for the next elements, until there are B-1 routing keys in the parent 对下一个元素重复步骤1,直到父级中有B-1个路由键
  5. When there are B-1 routing keys in the parent, it means it's full. 当父母中有B-1路由键时,表示它已满。 So the new routing key will go the "grandfather" instead (so the tree grows one level), and all the new leaves will have a new parent 所以新的路由密钥将改为“祖父”(因此树增长一级),所有新的叶子将有一个新的父级
  6. Keep going like this until N/B blocks are read 继续这样,直到读取N/B块为止

Basically, the problem with this is that I'm not considering the minimum number of children that an internal node can have. 基本上,问题在于我没有考虑内部节点可以拥有的最小子节点数。 So it could happen for example that a node end up with only one child, which is obviously wrong. 因此,例如,一个节点最终只有一个子节点,这显然是错误的。

I looked everywhere but I couldn't find an algorithm that actually explains how to build a tree in Θ(N / B * logM / B(N / B)) . 我到处寻找,但我找不到实际解释如何在Θ(N / B * logM / B(N / B))构建树的算法。 All I find are algorithms with simple insertions into the tree for each item in the list, without exploiting the B factor. 我找到的只是在列表中为每个项目简单插入树的算法,而没有利用B因子。

Can you help me out, maybe point me in the right direction? 你能帮助我吗,也许能指出我正确的方向?

Rather than build all of the levels at the same time, which may use more than a constant number of blocks of RAM, I think that I would build the levels leafmost to rootmost (ie, breadth-first instead of depth-first). 而不是同时构建所有级别,可能使用多于一定数量的RAM块,我认为我将构建最基本的级别(即,广度优先而不是深度优先)。 Given the list, cut it greedily into blocks of size B. If there is only one block, then that's the root. 给定列表,将其贪婪地切成大小为B的块。如果只有一个块,那就是根。 Otherwise, if the last block has too few elements, then rebalance its elements with those of the second last block as evenly as possible; 否则,如果最后一个块的元素太少,则尽可能均匀地重新平衡其元素与第二个块的元素; both now will have enough elements. 两者现在都有足够的元素。 The next list is comprised of the last element in each block of this level. 下一个列表由该级别的每个块中的最后一个元素组成。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM