简体   繁体   English

在进行最多 3N 次比较时如何实现 std::make_heap ?

[英]How can std::make_heap be implemented while making at most 3N comparisons?

I looked in to the C++0x standard and found the requirement that make_heap should do no more than 3*N comparisons.我查看了 C++0x 标准,发现 make_heap 的比较不能超过 3*N 的要求。

Ie heapify an unordered collection can be done in O(N)即 heapify 一个无序的集合可以在 O(N) 中完成

   /*  @brief  Construct a heap over a range using comparison functor.

Why is this?为什么是这样?

The source gives me no clues (g++ 4.4.3)来源没有给我任何线索(g ++ 4.4.3)

The while (true) + __parent == 0 are not clues but rather a guess for O(N) behaviour while (true) + __parent == 0 不是线索,而是对 O(N) 行为的猜测

template<typename _RandomAccessIterator, typename _Compare>
void
make_heap(_RandomAccessIterator __first, _RandomAccessIterator __last,
          _Compare __comp)
{

  const _DistanceType __len = __last - __first;
  _DistanceType __parent = (__len - 2) / 2;
  while (true)
    {
      _ValueType __value = _GLIBCXX_MOVE(*(__first + __parent));
      std::__adjust_heap(__first, __parent, __len, _GLIBCXX_MOVE(__value),
                 __comp);
      if (__parent == 0)
        return;
      __parent--;
    }
}

__adjust_heap looks like a log N method: __adjust_heap 看起来像一个 log N 方法:

while ( __secondChild < (__len - 1) / 2)
{
    __secondChild = 2 * (__secondChild + 1);

Is a bog standard log N to me.对我来说是一个沼泽标准日志 N。

  template<typename _RandomAccessIterator, typename _Distance,
       typename _Tp, typename _Compare>
    void
    __adjust_heap(_RandomAccessIterator __first, _Distance __holeIndex,
          _Distance __len, _Tp __value, _Compare __comp)
    {
      const _Distance __topIndex = __holeIndex;
      _Distance __secondChild = __holeIndex;
      while (__secondChild < (__len - 1) / 2)
      {
        __secondChild = 2 * (__secondChild + 1);
          if (__comp(*(__first + __secondChild),
             *(__first + (__secondChild - 1))))
          __secondChild--;
          *(__first + __holeIndex) = _GLIBCXX_MOVE(*(__first + __secondChild));
          __holeIndex = __secondChild;
      }
      if ((__len & 1) == 0 && __secondChild == (__len - 2) / 2)
      {
        __secondChild = 2 * (__secondChild + 1);
        *(__first + __holeIndex) = _GLIBCXX_MOVE(*(__first
                             + (__secondChild - 1)));
        __holeIndex = __secondChild - 1;
      }
      std::__push_heap(__first, __holeIndex, __topIndex, 
               _GLIBCXX_MOVE(__value), __comp);      
      }

Any clues to why this is O <= 3N will be appreciated.任何关于为什么这是 O <= 3N 的线索将不胜感激。
EDIT:编辑:

Experimental results:实验结果:

This actual implementation uses这个实际的实现使用

  • <2N comparisons for heapifying heaps <2N 比较堆的堆
  • <1.5N for heapifying heaps in the reverse order. <1.5N 以相反的顺序堆积堆。

A binary heap over n elements can be created in O(n) time using a clever algorithm and a clever analysis.使用巧妙的算法和巧妙的分析,可以在 O(n) 时间内创建一个包含 n 个元素的二进制堆。 In what follows I'm just going to talk about how this works assuming that you have explicit nodes and explicit left and right child pointers, but this analysis is still perfectly valid once you compress it into an array.在接下来的内容中,我将讨论它是如何工作的,假设您有显式节点和显式左右子指针,但是一旦将其压缩到数组中,这种分析仍然完全有效。

The algorithm works as follows.该算法的工作原理如下。 Start off by taking about half of the nodes and treating them as singleton max-heaps - since there's only one element, the tree containing just that element must automatically be a max-heap.首先取大约一半的节点并将它们视为 singleton 最大堆 - 由于只有一个元素,因此仅包含该元素的树必须自动成为最大堆。 Now, take these trees and pair them off with one another.现在,把这些树和它们配对。 For each pair of trees, take one of the values that you haven't used yet and execute the following algorithm:对于每对树,取其中一个尚未使用的值并执行以下算法:

  1. Make the new node the root of the heap, having its left and right child pointers refer to the two max-heaps.使新节点成为堆的根,使其左右子指针指向两个最大堆。

  2. While this node has a child that's larger than it, swap the child with its larger child.虽然此节点有一个比它大的子节点,但将子节点与其更大的子节点交换。

My claim is that this procedure ends up producing a new max heap containing the elements of the two input max-heaps, and it does so in time O(h), where h is the height of the two heaps.我的主张是,这个过程最终会产生一个新的最大堆,其中包含两个输入最大堆的元素,并且它在 O(h) 时间内完成,其中 h 是两个堆的高度。 The proof is an induction on the height of the heaps.证明是对堆高度的归纳。 As a base case, if the subheaps have size zero, then the algorithm terminates immediately with a singleton max-heap, and it does so in O(1) time.作为基本情况,如果子堆的大小为零,则算法立即以 singleton 最大堆终止,并且在 O(1) 时间内完成。 For the inductive step, assume that for some h, this procedure works on any subheaps of size h and consider what happens when you execute it on two heaps of size h + 1. When we add a new root to join together two subtrees of size h + 1, there are three possibilities:对于归纳步骤,假设对于某些 h,此过程适用于任何大小为 h 的子堆,并考虑在两个大小为 h + 1 的堆上执行它时会发生什么。当我们添加一个新根以将两个大小为的子树连接在一起时h + 1,有三种可能:

  1. The new root is larger than the roots of both subtrees.新根大于两个子树的根。 Then in this case we have a new max-heap, since the root is larger than any of the nodes in either subtree (by transitivity)然后在这种情况下,我们有一个新的最大堆,因为根大于任一子树中的任何节点(通过传递性)

  2. The new root is larger than one child and smaller than the other.新的根比一个孩子大,比另一个小。 Then we swap the root with the larger subchild and recursively execute this procedure again, using the old root and the child's two subtrees, each of which are of height h.然后我们将根与较大的子子交换,并再次递归执行此过程,使用旧根和子的两个子树,每个子树的高度为 h。 By the inductive hypothesis, this means that the subtree we swapped into is now a max-heap.根据归纳假设,这意味着我们交换的子树现在是一个最大堆。 Thus the overall heap is a max-heap, since the new root is larger than everything in the subtree we swapped with (since it's larger than the node we added and was already larger than everything in that subtree), and it's also larger than everything in the other subtree (since it's larger than the root and the root was larger than everything in the other subtree).因此整个堆是一个最大堆,因为新的根比我们交换的子树中的所有东西都大(因为它比我们添加的节点大并且已经比那个子树中的所有东西都大),而且它也比所有东西都大在另一个子树中(因为它大于根并且根大于另一个子树中的所有内容)。

  3. The new root is smaller than both its children.新的根比它的两个孩子都小。 Then using a slightly modified version of the above analysis, we can show that the resulting tree is indeed a heap.然后使用上面分析的稍微修改的版本,我们可以证明生成的树确实是一个堆。

Moreover, since at each step the heights of the child heaps decreases by one, the overall runtime for this algorithm must be O(h).此外,由于在每一步子堆的高度都会减少 1,因此该算法的总运行时间必须为 O(h)。


At this point, we have a simple algorithm for making a heap:至此,我们有了一个简单的堆算法:

  1. Take about half the nodes and create singleton heaps.取大约一半的节点并创建 singleton 堆。 (You can compute explicitly how many nodes will be needed here, but it's about half). (您可以在这里明确计算需要多少节点,但大约是一半)。
  2. Pair those heaps off, then merge them together by using one of the unused nodes and the above procedure.将这些堆配对,然后使用未使用的节点之一和上述过程将它们合并在一起。
  3. Repeat step 2 until a single heap remains.重复步骤 2,直到剩下一个堆。

Since at each step we know that the heaps we have so far are valid max-heaps, eventually this produces a valid overall max-heap.因为在每一步我们都知道到目前为止我们拥有的堆是有效的最大堆,最终这会产生一个有效的整体最大堆。 If we're clever with how we pick how many singleton heaps to make, this will end up creating a complete binary tree as well.如果我们能够巧妙地选择要制作多少个 singleton 堆,那么最终也将创建一个完整的二叉树。

However, it seems like this should run in O(n lg n) time, since we do O(n) merges, each of which runs in O(h), and in the worst case the height of the trees we're merging is O(lg n).但是,这似乎应该在 O(n lg n) 时间内运行,因为我们进行 O(n) 合并,每个合并都在 O(h) 中运行,在最坏的情况下,我们正在合并的树的高度是 O(lg n)。 But this bound is not tight and we can do a lot better by being more precise with the analysis.但是这个界限并不紧密,我们可以通过更精确的分析来做得更好。

In particular, let's think about how deep all the trees we merge are.特别是,让我们考虑一下我们合并的所有树有多深。 About half the heaps have depth zero, then half of what's left has depth one, then half of what's left has depth two, etc. If we sum this up, we get the sum大约一半的堆深度为零,剩下的一半深度为一,剩下的一半深度为二,依此类推。如果我们总结一下,我们得到总和

0 * n/2 + 1 * n/4 + 2 * n/8 +... + nk/(2 k ) = Σ k = 0 ⌈log n⌉ (nk / 2 k ) = n Σ k = 0 ⌈log n⌉ (k / 2 k+1 ) 0 * n/2 + 1 * n/4 + 2 * n/8 +... + nk/(2 k ) = Σ k = 0 ⌈log n⌉ (nk / 2 k ) = n Σ k = 0 ⌈日志 n⌉ (k / 2 k+1 )

This upper-bounds the number of swaps made.这是交换次数的上限。 Each swap requires at most two comparisons.每次交换最多需要两次比较。 Therefore, if we multiply the above sum by two, we get the following summation, which upper-bounds the number of swaps made:因此,如果我们将上述总和乘以 2,我们会得到以下总和,它是交换次数的上限:

n Σ k = 0 (k / 2 k ) n Σ k = 0 (k / 2 k )

The summation here is the summation 0 / 2 0 + 1 / 2 1 + 2 / 2 2 + 3 / 2 3 +... .这里的求和是求和 0 / 2 0 + 1 / 2 1 + 2 / 2 2 + 3 / 2 3 +... 。 This is a famous summation that can be evaluated in multiple different ways.这是一个著名的总结,可以用多种不同的方式进行评估。 One way to evaluate this is given in these lecture slides, slides 45-47 . 这些演讲幻灯片,幻灯片 45-47 中给出了评估这一点的一种方法。 It ends up coming out to exactly 2n, which means that the number of comparisons that end up getting made is certainly bounded from above by 3n.它最终精确到 2n,这意味着最终进行的比较次数肯定以 3n 为界。

Hope this helps!希望这可以帮助!

@templatetypedef has already given a good answer for why the asymptotic run time of build_heap is O(n) . @templatetypedef 已经给出了一个很好的答案,为什么build_heap的渐近运行时间是O(n) There is also a proof in chapter 6 of CLRS , 2nd edition.CLRS第 2 版的第 6 章中也有一个证明。

As for why the C++ standard requires that at most 3n comparisons are used:至于为什么C++标准要求最多使用3n次比较:

From my experiments (see code below), it appears that actually less than 2n comparisons are needed.从我的实验(见下面的代码)看来,实际上需要少于2n 次比较。 In fact, these lecture notes contain a proof that build_heap only uses 2(n-⌈log n⌉) comparisons.事实上,这些讲义包含了build_heap仅使用2(n-⌈log n⌉)比较的证明。

The bound from the standard seems to be more generous than required.标准的界限似乎比要求的更慷慨。


def parent(i):
    return i/2

def left(i):
    return 2*i

def right(i):
    return 2*i+1

def heapify_cost(n, i):
    most = 0
    if left(i) <= n:
        most = 1 + heapify_cost(n, left(i))
    if right(i) <= n:
        most = 1 + max(most, heapify_cost(n, right(i)))
    return most

def build_heap_cost(n):
    return sum(heapify_cost(n, i) for i in xrange(n/2, 1, -1))

Some results:一些结果:

n                     10  20  50  100  1000  10000
build_heap_cost(n)     9  26  83  180  1967  19960

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM