简体   繁体   中英

How to efficiently construct a B+ tree from a given list?

I want to build a B+ tree from a given list of unordered elements of size N .

I know that the optimal bound to do it is Θ(N / B * logM / B(N / B)) block transfers, which is also the optimal for sorting; so I can't simply pick an item and do an insert in the tree individually, since it would give me O(N logB(N)) block transfers.

So I figured that the best way to build the tree is to sort the elements first, since the leaves are ordered anyway. From that, I'm at a loss.

I thought about something like this:

  1. Take B elements from the list
  2. Write them as they are somewhere (it's a leaf of the three)
  3. Take the last element of the block (the biggest); it will be a routing key for the parent of the leaf
  4. Repeat Step 1 for the next elements, until there are B-1 routing keys in the parent
  5. When there are B-1 routing keys in the parent, it means it's full. So the new routing key will go the "grandfather" instead (so the tree grows one level), and all the new leaves will have a new parent
  6. Keep going like this until N/B blocks are read

Basically, the problem with this is that I'm not considering the minimum number of children that an internal node can have. So it could happen for example that a node end up with only one child, which is obviously wrong.

I looked everywhere but I couldn't find an algorithm that actually explains how to build a tree in Θ(N / B * logM / B(N / B)) . All I find are algorithms with simple insertions into the tree for each item in the list, without exploiting the B factor.

Can you help me out, maybe point me in the right direction?

Rather than build all of the levels at the same time, which may use more than a constant number of blocks of RAM, I think that I would build the levels leafmost to rootmost (ie, breadth-first instead of depth-first). Given the list, cut it greedily into blocks of size B. If there is only one block, then that's the root. Otherwise, if the last block has too few elements, then rebalance its elements with those of the second last block as evenly as possible; both now will have enough elements. The next list is comprised of the last element in each block of this level.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM