简体   繁体   English

从排序数组构造 BTree

[英]Construct a BTree from a sorted array

I need to construct a BTree from a sorted array.我需要从排序数组构造一个 BTree。 Any pointers or how can i write such an algorithm?任何指针或我如何编写这样的算法? Is there any algoritm that can take advantage of the array being sorted?是否有任何算法可以利用正在排序的数组?

I did search it on the google but could not find algorithm for BTree.我确实在谷歌上搜索过,但找不到 BTree 的算法。

The way of @rossum is simple, but it is not efficient. @rossum 的方式很简单,但是效率不高。 His proposal don't use the information, that have a sorted list.他的建议不使用具有排序列表的信息。
I think, that you have solved the problem of equal keys.我认为,您已经解决了相等键的问题。 (Nodes of the BTree or List in one Node of the BTree) (BTree的节点或BTree的一个节点中的列表)

I understand your question in this way:我是这样理解你的问题的:

You want convert a sorted list/arry directly to a btree with the degree t.您想将排序列表/数组直接转换为度数为 t 的 btree。 I have only a guess/idea.我只有一个猜测/想法。 Detect the point of检测点

  1. Step Detect the minimal height h for a given number of elements n be caclulation the range of min and max elements for the given height of a BTree.步骤 检测给定数量元素的最小高度 h,计算给定 BTree 高度的最小和最大元素的范围。

    n(max) = sum(i=0,i<=h){Res += t^i*(t-1)} n(max) = sum(i=0,i<=h){Res += t^i*(t-1)}

    n(min) = sum(i=0,i<h){Res += ceil(t/2)^i*(ceil(t/2)-1))} // ceil the result of the divisions n(min) = sum(i=0,i<h){Res += ceil(t/2)^i*(ceil(t/2)-1))} // ceil 除法结果

  2. Step Minimize the range between t and t/2 to t>=x(high) and x(Low)>=(t/2), while n is in the range of x(low) and x(max) 3.Step calculation rekursive the node, where you have to change vorm x(high) to x(Low) Step 最小化 t 和 t/2 之间的范围到 t>=x(high) 和 x(Low)>=(t/2),而 n 在 x(low) 和 x(max) 的范围内 3.Step计算递归节点,您必须将 vorm x(high) 更改为 x(Low)

  3. step Construct your BTree, which contains internal nodes with degree (x(high)) and which contains leaves with (x(High)-1) internal keys until you reach your reference node.步骤 构建您的 BTree,其中包含具有度 (x(high)) 的内部节点,并包含具有 (x(High)-1) 内部键的叶子,直到您到达参考节点。 After reaching your reference node all internal nodes have x(low)=x(high)-1 degrees and all leaf-nodes habe x(low) keys.到达参考节点后,所有内部节点都具有 x(low)=x(high)-1 度,所有叶节点都具有 x(low) 键。

But this is only an idea.但这只是一个想法。 I havn't programmed it.我还没有编程。

@padina Your pretty right. @padina你说得对。 I build something in python for my work.我在 python 中为我的工作构建了一些东西。 I shrinked it down to the minimum code.我把它缩小到最小代码。 Maybe you want to take a look at dbis-btree-git也许你想看看dbis-btree-git

So it's more pseudocode:所以它更像是伪代码:

def valuesInFullTree(self):
        '''
        Determine the number of int-values for corresponding 
        valuesCountInOneNode, and height value

        self   : BTree_Creator
        return : int
        '''
        if self.height < 0 or self.valuesCountInOneNode < 0:
            return 0

        count = self.valuesCountInOneNode
        curHeight = 1
        for _ in range(0,self.height):
            count += pow(self.valuesCountInOneNode+1,curHeight)*self.valuesCountInOneNode
            curHeight += 1
        return count
    

def buildTree(self):
        '''
        This method builds first a full BTree. 
        
        self : BTree_Creator
        '''
        neededValuesCount = self.valuesInFullTree()
        if neededValuesCount >  self.maxValueInTree:
            raise ValueError('Please use a bigger values for self.maxValueInTree, at least bigger than '+str(neededValuesCount))
        
        valuesInTree = random.sample(range(1, self.maxValueInTree), neededValuesCount)
        valuesInTree.sort()
        # build a BTree with only values from valuesInTree
        stackForRecursion = [(0,None,valuesInTree,0)]
        self.rekursivSplitArray(stackForRecursion)

def rekursivSplitArray(self, stackForRecursion):
        '''
        Split Array and generate from new Splitpoints a Node in the Tree
        stackForRecursion = [(nodeName, parentNode, values, currentHeight), ( ... ]

        stackForRecursion                             : array with tupels
        (nodeName, parentNode, values, currentHeight) : (int, int/None, int-array, int)
        '''
        if len(stackForRecursion) == 0:
            return
        (nodeName, parentNode, values,
         currentHeight) = stackForRecursion.pop(0)
        
        # self.height must be defined
        if currentHeight > self.height:
            return

        valuesForNode = values

        if currentHeight < self.height:
            # there are children for the current node
            firstSplitIdx = len(values)/(self.valuesCountInOneNode+1)
            splitPointIdx = firstSplitIdx
            previousSplitIdx = 0
            valuesForNode = []
            nextNodeName = nodeName * (self.valuesCountInOneNode + 1)


            for i in range(0,self.valuesCountInOneNode+1):
                if i < self.valuesCountInOneNode:
                    valuesForNode.append(values[int(splitPointIdx)])
                
                nextNodeName += 1
                nodeTupel = (nextNodeName, nodeName,
                            values[int(previousSplitIdx):int(splitPointIdx)],
                            currentHeight + 1)

                stackForRecursion.append(nodeTupel)

                previousSplitIdx = splitPointIdx + 1
                splitPointIdx += firstSplitIdx
        else:
            if len(valuesForNode) != self.valuesCountInOneNode:
                raise ValueError('The rekursiv tree build algo doesnt work')

        
        # add Node to myBBaum
# pay ATTENTION this add_node, and add_edge lwad to an error...
         self.myBBaum.add_node(BTree_Creator.getNodeName(nodeName), valuesForNode)
        # add Edge to myBBaum
        if parentNode != None:
            thisNodeIsChildNumber = (nodeName % (self.valuesCountInOneNode + 1))
            if thisNodeIsChildNumber == 0:
                thisNodeIsChildNumber = self.valuesCountInOneNode + 1

            self.myBBaum.add_edge(BTree_Creator.getNodeName(parentNode),
                                  BTree_Creator.getNodeName(nodeName),
                                  thisNodeIsChildNumber)
        
        
        
    
        # rekursiv call
        for _ in range(0, self.valuesCountInOneNode + 1):
            self.rekursivSplitArray(stackForRecursion)

# insert a number and it return the letter combination
    # for ex: 2=B,3=C,27=AA
    @staticmethod
    def getNodeName(num):
        numOfA_s = num // 26
        nodeName = ''
        for _ in range(0, numOfA_s):
            nodeName += 'A'

        return nodeName + str(chr((num % 26) + ord('A')))

    # insert a number and it return the letter combination
    # for ex: B=2,C=3,AA=27
    @staticmethod
    def convertNodeNameInNumber(name):
        number = -ord('A')
        for char in name:
            number += ord(char)

        return number

Here's a simple and efficient algorithm for building the B-tree.这是构建 B 树的简单有效的算法。 First, an observation.首先,一个观察。 If you insert all the values in the sorted sequence in order, then each newly-inserted element will end up being placed in the rightmost leaf node of the tree.如果按顺序插入排序序列中的所有值,则每个新插入的元素最终将被放置在树的最右边的叶节点中。 From there, you may have too many keys in the rightmost leaf, and you can then use the regular B-tree insertion algorithm to split the node and kick keys higher up in the tree.从那里,你可能在最右边的叶子中有太多的键,然后你可以使用常规的 B-tree 插入算法来分割节点并将键踢到树的更高位置。

So here's the algorithm: maintain a pointer to the rightmost leaf node.所以这是算法:维护一个指向最右边叶节点的指针。 For each element in the sorted sequence, place that item as the largest item in the rightmost leaf.对于排序序列中的每个元素,将该项目作为最大的项目放在最右边的叶子中。 If the leaf overflows, use the regular B-tree insertion fix up logic to split the node and kick a key higher in the tree.如果叶子溢出,则使用常规的 B-tree 插入修复逻辑来拆分节点并将密钥踢到树的更高位置。

Intuitively, almost all of the insertions you do will not require a split and will run in time O(1) - just follow a pointer and place the key.直观地说,您所做的几乎所有插入都不需要拆分,并且会在 O(1) 时间内运行 - 只需跟随指针并放置密钥。 A small fraction will require one split and kick a key higher up in the tree.一小部分将需要一次拆分并将密钥踢到树的更高位置。 An even smaller fraction will require two splits, an even smaller will require three, etc. By either counting exactly how many inserts will require splitting multiple keys or using an amortized analysis , you can show that the total work required here is O(n), which is as fast as you're going to be able to get things in an asymptotic sense.更小的分数将需要两次拆分,更小的分数将需要三个等。通过准确计算需要拆分多个键的插入次数或使用摊销分析,您可以证明此处所需的总工作量为 O(n) ,这与您能够以渐近的方式获得事物一样快。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM