简体   繁体   English

在heapsort下限?

[英]Lower bound on heapsort?

It's well-known that the worst-case runtime for heapsort is Ω(n lg n), but I'm having trouble seeing why this is. 众所周知,heapsort的最坏情况运行时为Ω(n lg n),但我很难理解为什么会这样。 In particular, the first step of heapsort (making a max-heap) takes time Θ(n). 特别是,heapsort的第一步(制作最大堆)需要时间Θ(n)。 This is then followed by n heap deletions. 然后是n次删除。 I understand why each heap deletion takes time O(lg n); 我理解为什么每个堆删除需要时间O(lg n); rebalancing the heap involves a bubble-down operation that takes time O(h) in the height of the heap, and h = O(lg n). 重新平衡堆涉及一个向下泡沫的操作,它在堆的高度花费时间O(h),并且h = O(lg n)。 However, what I don't see is why this second step should take Ω(n lg n). 但是,我没有看到为什么第二步应该采用Ω(n lg n)。 It seems like any individual heap dequeue wouldn't necessarily cause the node moved to the top to bubble all the way down the tree. 似乎任何单个堆出列都不一定会导致节点移动到顶部以一直向下冒泡到树中。

My question is - does anyone know of a good lower-bound proof for the best-case behavior of heapsort? 我的问题是 - 有没有人知道heapsort的最佳案例行为的良好下限证明?

So I did a bit of digging myself and it looks like this result actually is fairly recent! 所以我做了一些挖掘自己,看起来这个结果实际上是最近的! The first lower-bound proof I can find is from 1992, though heapsort itself was invented in 1964. 我能找到的第一个下限证据来自1992年,尽管heapsort本身是在1964年发明的。

The formal lower-bound proof is due to Schaffer and Sedgewick's "The Analysis of Heapsort" paper. 正式的下限证据是由Schaffer和Sedgewick的“The Heaport分析”论文引起的。 Here's a slightly paraphrased version of the proof that omits some of the technical details. 这是一个略微复述的证明版本,省略了一些技术细节。

To begin, let's suppose that n = 2 k - 1 for some k, which guarantees that we have a complete binary heap. 首先,假设对于某些k,n = 2 k - 1,这保证了我们有一个完整的二进制堆。 I'll show how to handle this case separately later on. 我稍后将展示如何单独处理此案例。 Because we have 2 k - 1 elements, the first pass of heapsort will, in Θ(n), build up a heap of height k. 因为我们有2 k - 1个元素,所以heapsort的第一遍将在Θ(n)中构建一个高度为k的堆。 Now, consider the first half of the dequeues from this heap, which removes 2 k-1 nodes from the heap. 现在,考虑从这个堆中出列的前半部分,从堆中删除2个k-1节点。 The first key observation is that if you take the starting heap and then mark all of the nodes here that actually end up getting dequeued, they form a subtree of the heap (ie every node that get dequeued has a parent that also gets dequeued). 第一个关键的观察是,如果您获取起始堆然后标记实际上最终出列的所有节点,它们将形成堆的子树(即,每个出列的节点都有一个也会出列的父节点)。 You can see this because if this weren't the case, then there would be some node whose (larger) parent didn't get dequeued though the node itself was dequeued, meaning that the values are out of order. 你可以看到这一点,因为如果不是这种情况,那么虽然节点本身已经出列,但是某些节点的(较大的)父节点没有出列,这意味着这些值是乱序的。

Now, consider how the nodes of this tree are distributed across the heap. 现在,考虑一下这个树的节点如何在堆中分布。 If you label the levels of the heap 0, 1, 2, ..., k - 1, then there will be some number of these nodes in levels 0, 1, 2, ..., k - 2 (that is, everything except the bottom level of the tree). 如果你标记堆0,1,2,...,k - 1的级别,那么在级别0,1,2,...,k - 2中将会有一些这样的节点(也就是说,除了树的底层之外的所有东西)。 In order for these nodes to get dequeued from the heap, then they have to get swapped up to the root, and they only get swapped up one level at a time. 为了使这些节点从堆中出列,它们必须交换到根,并且它们一次只能交换一个级别。 This means that one way to lower-bound the runtime of heapsort would be to count the number of swaps necessary to bring all of these values up to the root. 这意味着降低heapsort运行时的一种方法是计算将所有这些值提升到根所需的交换次数。 In fact, that's exactly what we're going to do. 事实上,这正是我们要做的。

The first question we need to answer is - how many of the largest 2 k-1 nodes are not in the bottom level of the heap? 我们需要回答的第一个问题是 - 有多少最大的2 k-1节点不在堆的底层? We can show that this is no greater than 2 k-2 by contradiction. 我们可以通过矛盾表明这不超过2 k-2 Suppose that there are at least 2 k-2 + 1 of the largest nodes in the bottom level of the heap. 假设堆底层中至少有2个k-2 + 1个最大节点。 Then each of the parents of those nodes must also be large nodes in level k - 2. Even in the best case, this means that there must be at least 2 k-3 + 1 large nodes in level k - 2, which then means that there would be at least 2 k-4 + 1 large nodes in level k - 3, etc. Summing up over all of these nodes, we get that there are 2 k-2 + 2 k-3 + 2 k-4 + ... + 2 0 + k large nodes. 那么这些节点的每个父节点也必须是k-2级的大节点。即使在最好的情况下,这意味着在k-2级中必须至少有2个k-3 + 1个大节点,这意味着在k-3等级中至少有2个k-4 + 1个大节点。总结所有这些节点,我们得到2 k-2 + 2 k-3 + 2 k-4 + ... + 2 0 + k个大节点。 But this value is strictly greater than 2 k-1 , contradicting the fact that we're working with only 2 k-1 nodes here. 但是这个值严格大于2 k-1 ,这与我们在这里只使用2 k-1节点的事实相矛盾。

Okay... we now know that there are at most 2 k-2 large nodes in the bottom layer. 好的......我们现在知道底层最多有2个k-2个大节点。 This means that there must be at least 2 k-2 of the large nodes in the first k-2 layers. 这意味着在第一个k-2层中必须至少有2 k-2个大节点。 We now ask - what is the sum, over all of these nodes, of the distance from that node to the root? 我们现在问 - 在所有这些节点上,从该节点到根节点的距离是多少? Well, if we have 2 k-2 nodes positioned somewhere in a complete heap, then at most 2 k-3 of them can be in the first k - 3 levels, and so there are at least 2 k-2 - 2 k-3 = 2 k-3 heavy nodes in level k - 2. Consequently, the total number of swaps that need to be performed are at least (k - 2) 2 k-3 . 好吧,如果我们有2个k-2节点位于完整堆中的某个位置,那么它们中最多2 k-3可以处于前k-3级,因此至少有2 k-2 - 2 k- 3 = k-2级中的2 k-3个重节点。因此,需要执行的交换总数至少为(k-2)2 k-3 Since n = 2 k -1, k = Θ(lg n), and so this value is Θ(n lg n) as required. 由于n = 2k -1,k =Θ(lg n),因此该值根据需要为Θ(n lg n)。

Simple observation answer is this: The items in heap are: 简单的观察答案是这样的:堆中的项目是:

1
2
4
8
...
2^[log(n/4)]
and last level has between (1..2^[log(n/2)]) ==> (1,[n/2]) item, (by [] I mean Ceiling not roof)

for example if you have 7 item: 例如,如果你有7项:

1
2
4

and if you have 8 item: 如果您有8个项目:

1
2
4
1

There is 2 different heap tree, first at least n/4 - 1 items of a heap are in last level, or not, so there is at least n/4 - 1 item in level before last one, In the first case it takes O((n/4 - 1) * log(n/2)) to remove last level items from heap, and in the second case it takes O((n/4 - 1) * log(n/4)) to remove items from pre last level. 有两个不同的堆树,首先至少有n / 4 - 1个堆的项目在最后一级,或者没有,所以在最后一个之前至少有n/4 - 1项目,在第一种情况下它需要O((n/4 - 1) * log(n/2))从堆中删除最后一级项目,在第二种情况下需要O((n/4 - 1) * log(n/4))从上一级删除项目。 so in both case it takes Ω(n log(n)) just for n/4 - 1 items, so it's a lower bound (easily can say it's tight lower bound). 所以在这两种情况下,它只需要n(4 log(n))n / 4 - 1项,所以它是一个下限(很容易说它是紧的下限)。

Here is a solution that uses CLRS terms: 以下是使用CLRS术语的解决方案:
We start with a max-heap that is a complete binary tree with n elements. 我们从一个max-heap开始,这是一个包含n元素的完整二叉树。
We can say that in a complete binary there are n/2 leaves and n/2 inner nodes. 我们可以说在完整的二进制文件中有n/2叶子和n/2内部节点。
n/2 iterations of HEAP-SORT remove the largest n/2 elements from the heap. HEAP-SORT n/2次迭代从堆中删除最大的n/2元素。
Let S be the set of the largest n/2 elements. S是最大的n/2元素的集合。
There can be at most n/4 elements from S in the leaves since there must be additional n/4 of them in the inner nodes. 叶子中最多可以有来自S n/4元素,因为在内部节点中必须有额外的n/4 n/4元素。
Let L be these n/4 largest elements from S that are in the leaves. L是叶子中S这些n/4最大元素。
So if there are n/4 elements from S at level 0 (the leaves level) then there must be at least n/8 of them at level 1. 因此,如果在0级(叶子级别)有来自S n/4元素,则在级别1必须至少有n/8 n/4元素。
Let P be these n/8 elements from S that are at level 1. P是来自S这些n/8元素,它们处于1级。
n/2 iterations of HEAP-SORT may give the elements from L a short cut to the root and then out of the heap, but the elements from P must make all the way to the root before they are removed from the heap. HE / SORT的n/2次迭代可以将L的元素简化为根,然后从堆中移出,但是P的元素必须在从堆中删除之前一直到根。
So there are at least (n/8)(lgn-1) operations, which gives us a running time of Ω(nlgn). 因此至少有(n/8)(lgn-1)运算,它们给出了Ω(nlgn)的运行时间。
Now for the case of a max-heap that doesn't have all its leaves at level 0. 现在针对max-heap的情况,它的所有叶子都没有在0级。
Let k be the number of its leaves at level 0. k为0级叶子的数量。
After k iterations of HEAP-SORT, we are left with a max-heap that is a complete binary tree with height lgn-1 . 之后k堆排序的迭代,我们只剩下一个最大堆即是一个完整的二叉树高度lgn-1
We can continue our proof the same way. 我们可以用同样的方式继续我们的证明。
Now for the case when there are less than n/4 leaves from S . 现在来自S叶子少于n/4的情况。
Let k be the number of elements from S that are in the leaves at level 0. k是来自S的元素的数量,它们位于0级的叶子中。
If k <= n/8 then there must be at least n/8 elements from S at level 1. 如果k <= n/8那么在级别1处必须存在来自S至少n/8元素。
This is because there can be a total of n/4 elements above level 1. 这是因为在1级之上总共可以有n/4元素。
We continue the proof the same way. 我们以同样的方式继续证明。
If k>n/8 then there must be at least n/16 elements from S that are at level 1. 如果k>n/8那么S必须至少有来自S n/16元素。
We continue the proof the same way. 我们以同样的方式继续证明。
We conclude that the running time of HEAP-SORT is Ω(nlgn). 我们得出结论,HEAP-SORT的运行时间是Ω(nlgn)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM