简体   繁体   English

以 klog(k) 复杂度打印最大堆大小 n 中的 k 个最大元素

[英]Print k largest elements in a max heap sized n, in klog(k) complexity

I tried writing an algorithm that prints the k largest elemtns of a max heap but I cannot do it in the right complexity.我尝试编写一个算法来打印最大堆的 k 个最大元素,但我无法以正确的复杂度来完成它。

This is the Pseudo-code I wrote-这是我写的伪代码-

Print_k_largest(A[1,…,n],k):

If k>Heapsize(A):Error
i=1
insert[B,A[i])
print(B[1])
k-=1

While k>0:
    if 2*i< Heapsize(A):
        Insert(B,A[2*i])
        Insert(B,A[2*i+1])

    elif 2*i= Heapsize(A):
       Insert(B,A[2*i])

    B[1]=B[Heapsize(B)]
    Heapsize(B)-=1
    Max-Heapify(B,1)
    print(B[1])
    i=Binary_search(A[1,…,n],B[1])      
    k-=1

enter image description here在此处输入图像描述

In this solution I create a new max heap based on the original one, so that its size is always smaller than K hence the complexity of max-heapify and other such functions is O(klogk) and not O(klogN) as I was requested to do.在此解决方案中,我基于原始最大堆创建了一个新的最大堆,因此它的大小始终小于 K 因此 max-heapify 和其他此类函数的复杂性是 O(klogk) 而不是我所要求的 O(klogN)去做。 This Pseudocode is based on the solution suggested here .此伪代码基于此处建议的解决方案。

The idea is like this- because it's a max heap the largest element is the root, the second largest element is one of the root's son, the third one is either the other son or the sons of the current largest one and so on.这个想法是这样的——因为它是一个最大堆,所以最大的元素是根,第二大元素是根的儿子之一,第三个是另一个儿子或当前最大的儿子的儿子,依此类推。 In each iteration I insert the sons of the former largest (the one I printed before), remove the former largest, Max-heapify (to make the heap a max heap again, hence the root is the newest largest) and print the newest largest (newest root).在每次迭代中,我插入前最大的儿子(我之前打印的那个),删除前最大的 Max-heapify(使堆再次成为最大堆,因此根是最新的最大的)并打印最新的最大的(最新的根)。 The principle in this brilliant solution (unfortuantely not mine haha) is to do all the changes on a second heap whose size is always smaller than K (because in each of the k iterations we add maximum 2 new elements and remove one) so that the runtime for actions like max-heapify is O(logk) and not O(logn).这个出色的解决方案(不幸的是不是我的哈哈)的原理是在大小总是小于 K 的第二堆上进行所有更改(因为在每次 k 迭代中我们最多添加 2 个新元素并删除一个元素)以便像 max-heapify 这样的操作的运行时间是 O(logk) 而不是 O(logn)。 The thing is that to add the sons of the current largest I need an acess to its location (index) on the original tree.问题是要添加当前最大的儿子,我需要访问它在原始树上的位置(索引)。 I don't know how to do it without it costing logn and runing everything.如果不花费登录和运行所有内容,我不知道该怎么做。

I would appreaciate any help.我会很感激任何帮助。

(This might be exactly the algorithm you're already trying to implement, if so hope this phrasing makes it clearer) (这可能正是您已经在尝试实现的算法,如果是这样希望这种措辞更清楚)

So I would create a second heap, but it would not just be a heap of values - it would be a heap of positions in the original heap, inserted by the value at that position.所以我会创建第二个堆,但它不仅仅是一堆值 - 它会是原始堆中的一堆位置,由 position 处的值插入。

Call the original heap A and the new heap B .调用原始堆A和新堆B Start by inserting the head of A into B .首先将A的头部插入B Then, repeatedly pop the head of B , and insert the children (in A ) into B.然后,重复弹出B的头部,并将孩子(在A中)插入 B。

So if A is build out of nodes like:因此,如果A是由以下节点构建的:

Node(value : int, left : Option[Node], right : Option[Node])

Ordered by valuevalue排序

Then B will be build out of meta-Nodes like:然后B将由元节点构建,例如:

MetaNode(value : Node, left : Option[MetaNode], right : Option[MetaNode])

Ordered by value.valuevalue.value

Initialize B with MetaNode(A.head) as its only elementMetaNode(A.head)作为唯一元素初始化 B

Then repeatedly do:然后反复做:

for i in range 0..k-1:
    current = B.pop.value
    B.push (current.left) //might be None, should be coded so this is a no-op
    B.push (current.right) //see above
    results.add(current.value)

That isn't the simplest algorithm I've ever described, so if anything is unclear, please ask and I'll try to describe it more clearly.这不是我描述过的最简单的算法,所以如果有任何不清楚的地方,请询问,我会尽量描述得更清楚。 :) :)

I have no idea why you think you need to binary search.我不知道为什么你认为你需要二进制搜索。 The point of a heap is to not need to do that.堆的要点是不需要这样做。

Remember that the key operations in a heap are as follows:请记住,堆中的关键操作如下:

  1. Append - add an element to the end. Append - 在末尾添加一个元素。
  2. Swap - exchange two elements. Swap ——交换两个元素。
  3. SiftDown - Take an element and have it "fall down" to its place. SiftDown - 取一个元素并让它“掉落”到它的位置。 That is, while it is not the root and is bigger than its parent, you Swap it with its parent and keep going.也就是说,虽然它不是根节点并且比它的父节点大,但你可以将它与它的父节点Swap并继续前进。 (Note, in the pointer version, you compare not by comparing the pointers, but by comparing the values that they point to. The principle is the same but you may need to rewrite the heap code to do it.) (请注意,在指针版本中,您不是通过比较指针来比较,而是通过比较它们指向的值。原理是相同的,但您可能需要重写堆代码才能做到这一点。)
  4. HeapInsert - You Append then SiftDown . HeapInsert - 你Append然后SiftDown
  5. SiftUp - Opposite of SiftDown . SiftUp - 与SiftDown相反。 While the element has children and is smaller than at least one, you swap it with the larger of its children and keep going.当该元素有子元素并且至少小于一个时,您可以将它与其较大的子元素交换并继续。
  6. HeapPop - You Swap the first and last elements, remove the last to a temporary variable, SiftUp the first element, then return what used to be the first element. HeapPop - Swap第一个和最后一个元素,将最后一个元素移至临时变量, SiftUp第一个元素,然后返回曾经是第一个元素的元素。

With these operations your pseudocode becomes:通过这些操作,您的伪代码变为:

Print_k_largest(A[1,…,n],k):

If k>Heapsize(A):Error
i=1
HeapInsert[B,pointer to A[i]])

While k>0:
    if 2*i< Heapsize(A):
        HeapInsert(B,pointer to A[2*i])
        HeapInsert(B,pointer to A[2*i+1])

    elif 2*i= Heapsize(A):
       HeapInsert(B,pointer to A[2*i])

    PtrToElement = HeapPop(B)    
    print(PtrToElement.value)      
    k-=1

If you work in a language with native pointers, like C, you can construct pointers very easily using & (aka "address of") and * (aka "thing pointed to by) operators. If you work in a language without pointers, you'll need to store indexes into A like i and j , and then get values back using A[i] and A[j] . Which means that A has to become an argument to every operation in your heap of "pointers" because the language doesn't support actual pointers.如果你使用一种带有本地指针的语言,比如 C,你可以很容易地使用& (又名“地址”)和* (又名“指向的东西”)运算符构造指针。如果你使用一种没有指针的语言,你'将需要像ij一样将索引存储到A中,然后使用A[i]A[j]取回值。这意味着A必须成为“指针”堆中每个操作的参数,因为语言不支持实际指针。

Either way the odds are good that you'll actually need to implement your own "heap of pointers".无论哪种方式,您实际需要实现自己的“指针堆”的可能性都很大。

In Python I cheat.在 Python 我作弊。 Rather than use integers in the heap, I'll use tuples.我不会在堆中使用整数,而是使用元组。 So I wind up storing (A[i], i) .所以我结束存储(A[i], i) Thus I have both value and index stored together, and compare by value first.因此,我将值和索引存储在一起,并首先按值进行比较。 If your language has dynamic typing and tuples, this can be more convenient than rewriting a heap implementation.如果你的语言有动态类型和元组,这比重写堆实现更方便。 But I doubt that the language you're working with will support this trick.但我怀疑您使用的语言是否支持此技巧。

You have to create a temporary Max binary heap and delete the node (k - 1) time and do the downheap on the heap every time you delete.您必须创建一个临时的 Max 二进制堆并删除节点 (k - 1) 次,并在每次删除时在堆上执行 downheap。 After this, you can simply print the node which is the Kth largest element of the heap.在此之后,您可以简单地打印堆中第 K 个最大元素的节点。 Time Complexity - O(K * LogK)时间复杂度 - O(K * LogK)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM