简体   繁体   English

如何有效检查大型偏斜二叉搜索树的高度是否平衡?

[英]How to efficiently check whether it's height balanced for a massively skewed binary search tree?

I was reading this answer on how to check if a BST is height balanced, and really hooked by the bonus question: 我正在阅读有关如何检查BST是否高度平衡的答案 ,并真的对奖金问题产生了兴趣:

Suppose the tree is massively unbalanced. 假设树严重不平衡。 Like, a million nodes deep on one side and three deep on the other. 像是,一侧有1百万个节点,另一侧有3个节点。 Is there a scenario in which this algorithm blows the stack? 是否存在这种算法会使堆栈崩溃的情况? Can you fix the implementation so that it never blows the stack, even when given a massively unbalanced tree? 您是否可以修复该实现,以使即使给定了巨大的不平衡树,它也不会破坏堆栈?

What would be a good strategy here? 这里有什么好的策略?

I am thinking to do a level order traversal and track the depth, if a leaf is found and current node depth is bigger than the leaf node depth + 2, then it's not balanced. 我正在考虑进行级别顺序遍历并跟踪深度,如果找到了叶子并且当前节点深度大于叶子节点深度+ 2,则它是不平衡的。 But how to combine this with height checking? 但是如何将其与高度检查结合起来?

Edit: below is the implementation in the linked answer 编辑:以下是链接答案中的实现

IsHeightBalanced(tree)
    return (tree is empty) or 
           (IsHeightBalanced(tree.left) and
           IsHeightBalanced(tree.right) and
           abs(Height(tree.left) - Height(tree.right)) <= 1)

To review briefly: a tree is defined as being either null or a root node with pointers .left to a left child and .right to a right child, where each child is in turn a tree, the root node appears in neither child, and no node appears in both children. 简要回顾:一棵树被定义为null或根节点的指针。左到左子和.right到右孩子,每个孩子又树,根节点出现在没有孩子,两个子节点中均未出现任何节点。 The depth of a node is the number of pointers that must be followed to reach it from the root node. 节点的深度是从根节点到达节点必须遵循的指针数。 The height of a tree is -1 if it's null or else the maximum depth of a node that appears in it. 如果树的高度为null,则树的高度为-1;否则,树的高度为其中出现的节点的最大深度。 A leaf is a node whose children are null. 是其子级为null的节点。

First let me note the two distinct definitions of "balanced" proposed by answerers of the linked question. 首先让我注意到链接问题的回答者提出的“平衡”的两个不同定义。

EL-balanced A tree is EL-balanced if and only if, for every node v, |height(v.left) - height(v.right)| EL 平衡当且仅当对于每个节点v,| height(v.left)-height(v.right)| <= 1. <= 1。

This is the balance condition for AVL trees. 这是AVL树的平衡条件。

DF-balanced A tree is DF-balanced if and only if, for every pair of leaves v, w, we have |depth(v) - depth(w)| DF 平衡当且仅当对每对叶子v,w具有| depth(v)-depth(w)|时,树才是DF平衡 <= 1. As DF points out, DF-balance for a node implies DF-balance for all of its descendants. <= 1.如DF所指出,节点的DF平衡意味着其所有后代的DF平衡。

DF-balance is used for no algorithm known to me, though the balance condition for binary heaps is very similar, requiring additionally that the deeper leaves be as far left as possible. DF-balance不用于我所知的算法,尽管二进制堆的平衡条件非常相似,另外还要求将较深的叶子尽可能地左移。

I'm going to outline three approaches to testing balance. 我将概述测试平衡的三种方法。

Size bounds for balanced trees 平衡树的大小范围

Expand the recursive function to have an extra parameter, maxDepth. 扩展递归函数以具有一个额外的参数maxDepth。 For each recursive call, pass maxDepth - 1, so that maxDepth roughly tracks how much stack space is left. 对于每个递归调用,传递maxDepth-1,以便maxDepth大致跟踪剩余的堆栈空间。 If maxDepth reaches 0, report the tree as unbalanced (eg, by returning "infinity" for the height), since no balanced tree that fits in main memory could possibly be that tall. 如果maxDepth达到0,则将树报告为不平衡(例如,通过返回“ infinity”作为高度),因为没有一个适合主内存的平衡树可能很高。

This approach relies on an a priori size bound on main memory, which is available in practice if not in all theoretical models, and the fact that no subtrees are shared. 这种方法依赖于主内存上的先验大小,如果不是在所有理论模型中都可以在实践中使用,并且没有子树被共享的事实。 (PROTIP: unless you're very careful, your subtrees will be shared at some point during development.) We also need height bounds on balanced trees of at most a given size. (提示:除非您非常小心,否则子树在开发过程中的某个时刻共享。)我们还需要在最大给定大小的平衡树上设置高度限制。

EL-balanced Via mutual induction, we prove a lower bound, L(h), on the number of nodes belonging to an EL-balanced tree of a given height h. EL平衡通过互感应,我们证明了属于给定高度h的EL平衡树的节点数下限L(h)。

The base cases are 基本案例是

L(-1) = 0
L(0) = 1,

more or less by definition. 根据定义或多或少。 The inductive case is trickier. 归纳的情况比较棘手。 An EL-balanced tree of height h > 0 is a node with an EL-balanced child of height h - 1 and another EL-balanced child of height either h - 1 or h - 2. This means that 高度为h> 0的EL平衡树是一个节点,该节点具有高度为h-1的EL平衡子代和另一个高度为h-1或h-2的EL平衡子代。

L(h) = 1 + L(h - 1) + min(L(h - 2), L(h - 1)).

Add 1 to both sides and rearrange. 两侧加1并重新排列。

L(h) + 1 = L(h - 1) + 1 + min(L(h - 2) + 1, L(h - 1) + 1).

A little while later ( spoiler ), we find that 过了一会儿( 剧透 ),我们发现

L(h) <= phi^(h + 2)/sqrt(5),
where phi = (1 + sqrt(5))/2 ~ 1.618.

maxDepth then should be set to the floor of the base-phi logarithm of the maximum number of nodes, plus a small constant that depends on fenceposty things. 然后应将maxDepth设置为最大节点数的base-phi对数的下限,再加上一个取决于fenceposty事物的小常数。

DF-balanced Rather than write out an induction proof, I'm going to appeal to your intuition that the worst case is a complete binary tree with one extra leaf on the bottom. DF平衡而不是写出归纳证明,我要吸引您的直觉是,最坏的情况是一棵完整的二叉树,在底部有一个额外的叶子。 Then the proper setting for maxDepth is the base-2 logarithm of the maximum number of nodes, plus a small constant. 然后,maxDepth的正确设置是最大节点数的以2为底的对数,再加上一个小的常数。

Iterative deepening depth-first search 迭代加深深度优先搜索

This is the theoretician's version of the answer above. 这是以上答案的理论家的说法。 Because, for some reason, we don't know how much RAM our computer has (and with logarithmic space usage, it's not as though we need a tight bound), we again include the maxDepth parameter, but this time, we use it to truncate the tree implicitly below the specified depth. 因为由于某种原因,我们不知道计算机有多少RAM(并且使用对数空间,这似乎并不需要严格限制),所以我们再次包含maxDepth参数,但是这次,我们使用它来在指定深度以下隐式截断树。 If the height of the tree comes back below the bound, then we know that the algorithm ran successfully. 如果树的高度返回到边界以下,则我们知道该算法已成功运行。 Alternatively, if the truncated tree is unbalanced, then so is the whole tree. 或者,如果截断的树不平衡,那么整个树也将不平衡。 The problem case is when the truncated tree is balanced but with height equal to maxDepth. 问题情况是当截断的树平衡但高度等于maxDepth时。 Then we increase maxDepth and retry. 然后,我们增加maxDepth并重试。

The simplest retry strategy is to increase maxDepth by 1 every time. 最简单的重试策略是每次将maxDepth增加1。 Since balanced trees with n nodes have height O(log n), the running time is O(n log n). 由于具有n个节点的平衡树的高度为O(log n),因此运行时间为O(n log n)。 In fact, for DF-balanced trees, the running time is also O(n), since, except for the last couple traversals, the size of the truncated tree increases by a factor of 2 each time, leading to a geometric series. 实际上,对于DF平衡树,运行时间也是O(n),因为除最后几次遍历外,被截断的树的大小每次都会增加2倍,从而导致几何级数。

Another strategy, doubling maxDepth each time, gives an O(n) running time for EL-balanced trees, since the largest tree of height h, with 2^(h + 1) - 1 nodes, is much smaller than the smallest tree of height 2h, with approximately (phi^2)^h nodes. 另一种策略是将maxDepth每次加倍,从而为EL平衡树提供O(n)的运行时间,因为高度为h的最大树(具有2 ^(h +1)-1个节点)比最小树的小得多。高度2h,大约(phi ^ 2)^ h个节点。 The downside of doubling is that we may use twice as much stack space. 倍增的缺点是我们可能使用两倍的堆栈空间。 With increase-by-1, however, in the family of minimum-size EL-balanced trees we constructed implicitly in defining L(h), the number of nodes at depth h - k in the tree of height h is polynomial of degree k. 但是,以1递增,在定义L(h)时隐式构造的最小尺寸EL平衡树的族中,高度为h的树中深度为h-k的节点数为度k的多项式。 Accordingly, the last few scans will incur some superlinear term. 因此,最后几次扫描将产生一些超线性项。

Temporarily mutating pointers 临时改变指针

If there are parent pointers, then it's easy to traverse depth-first in place, because the parent pointers can be used to derive the relevant information on the stack in an efficient manner. 如果有父指针,那么就很容易遍历深度优先,因为可以使用父指针以高效的方式在堆栈上导出相关信息。 If we don't have parent pointers but can mutate the tree temporarily, then, for descent into a child, we can cannibalize the pointer to that child to store temporarily the node's parent. 如果我们没有父指针,但是可以临时更改树,那么为了下降到一个子节点,我们可以蚕食该子节点的指针,以临时存储该节点的父节点。 The problem is determining on the way up whether we came from a left or a right child. 问题在于确定我们是来自左生还是右生。 If we can sneak a bit (say because pointers are 2-byte aligned, or because there's a spare bit in the balance factor, or because we're copying the tree for stop-and-copy garbage collection and can determine which arena we're in), then that's one way. 如果我们可以偷偷摸摸(例如,因为指针是2字节对齐的,或者因为平衡因子中有多余的位,或者因为我们正在复制树以进行停止并复制垃圾收集,并且可以确定我们在哪个领域重新输入),那是一种方法。 Another test assumes that the tree is a binary search tree. 另一个测试假设该树是二进制搜索树。 It turns out that we don't need additional assumptions, however: Explain Morris inorder tree traversal without using stacks or recursion . 事实证明,我们并不需要其他假设:在不使用stack或递归的情况下解释Morris有序遍历树

The one fly in the ointment is that this approach only works, as far as I know, on DF-balance, since there's no space on the stack to put the partial results for EL-balance. 美中不足的是,就我所知,这种方法仅适用于DF平衡,因为堆栈上没有空间放置EL平衡的部分结果。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM