简体   繁体   English

使用堆查找第 K 个最大元素的时间复杂度

[英]Time complexity of using heaps to find Kth largest element

I have some different implementations of the code for finding the Kth largest element in an unsorted array.我有一些不同的代码实现,用于在未排序的数组中查找第 K 个最大元素。 The three implementations I use all use either min/max heap, but I am having trouble figuring out the runtime complexity for one of them.我使用的三个实现都使用最小/最大堆,但我无法确定其中一个的运行时复杂性。

Implementation 1:实施1:

int findKthLargest(vector<int> vec, int k)
{
    // build min-heap
    make_heap(vec.begin(), vec.end(), greater<int>());

    for (int i = 0; i < k - 1; i++) {
        vec.pop_back();
    }

    return vec.back();
}

Implementation 2:实施2:

int findKthLargest(vector<int> vec, int k)
{
    // build max-heap
    make_heap(vec.begin(), vec.end());

    for (int i = 0; i < k - 1; i++) {
        // move max. elem to back (from front)
        pop_heap(vec.begin(), vec.end()); 
        vec.pop_back();
    }

    return vec.front();
}

Implementation 3:实施3:

int findKthLargest(vector<int> vec, int k)
{
    // max-heap prio. q
    priority_queue<int> pq(vec.begin(), vec.end());

    for (int i = 0; i < k - 1; i++) {
        pq.pop();
    }

    return pq.top();
}

From my reading, I am under the assumption that the runtime for the SECOND one is O(n) + O(klogn) = O(n + klogn).根据我的阅读,我假设第二个的运行时间是 O(n) + O(klogn) = O(n + klogn)。 This is because building the max-heap is done in O(n) and popping it will take O(logn)*k if we do so 'k' times.这是因为构建最大堆是在 O(n) 中完成的,如果我们这样做“k”次,弹出它需要 O(logn)*k。

However, here is where I am getting confused.但是,这就是我感到困惑的地方。 For the FIRST one, with a min-heap, I assume building the heap is O(n).对于第一个,使用最小堆,我假设构建堆是 O(n)。 Since it is a min-heap, larger elements are in the back.由于它是一个最小堆,较大的元素在后面。 Then, popping the back element 'k' times will cost k*O(1) = O(k).然后,弹出后面的元素“k”次将花费 k*O(1) = O(k)。 Hence, the complexity is O(n + k).因此,复杂度为 O(n + k)。

And similarly, for the third one, I assume the complexity is also O(n + klogn) with the same reasoning I had for the max-heap.同样,对于第三个,我假设复杂度也是 O(n + klogn),与我对最大堆的推理相同。

But, some sources still say that this problem cannot be done faster than O(n + klogn) with heaps/pqs!但是,一些消息来源仍然说这个问题不能比使用 heaps/pqs 的 O(n + klogn) 更快地完成! In my FIRST example, I think this complexity is O(n + k), however.但是,在我的第一个示例中,我认为这种复杂性是 O(n + k)。 Correct me if I'm wrong.如我错了请纠正我。 Need help thx.需要帮助。

Properly implemented, getting the kth largest element from a min-heap is O((nk) * log(n)).正确实现,从最小堆中获取第 k 个最大元素是 O((nk) * log(n))。 Getting the kth largest element from a max-heap is O(k * log(n)).从最大堆中获取第 k 个最大元素是 O(k * log(n))。

Your first implementation is not at all correct.您的第一个实现根本不正确。 For example, if you wanted to get the largest element from the heap (k == 1), the loop body would never be executed.例如,如果您想从堆中获取最大元素 (k == 1),则永远不会执行循环体。 Your code assumes that the last element in the vector is the largest element on the heap.您的代码假定向量中的最后一个元素是堆上的最大元素。 That is incorrect.这是不正确的。 For example, consider the heap:例如,考虑堆:

 1
3 2

That is a perfectly valid heap, which would be represented by the vector [1,3,2] .这是一个完全有效的堆,将由向量[1,3,2]表示。 Your first implementation would not work to get the 1st or 2nd largest element from that heap.您的第一个实现无法从该堆中获取第一个或第二个最大元素。

The second solution looks like it would work.第二种解决方案看起来可行。

Your first two solutions end up removing items from vec .您的前两个解决方案最终会从vec中删除项目。 Is that what you intended?那是你的意图吗?

The third solution is correct.第三种解决方案是正确的。 It takes O(n) to build the heap, and O((k - 1) log n) to remove the (k-1) largest items.构建堆需要 O(n),删除 (k-1) 个最大的项需要 O((k - 1) log n)。 And then O(1) to access the largest remaining item.然后 O(1) 访问最大的剩余项目。

There is another way to do it, that is potentially faster in practice.还有另一种方法可以做到这一点,这在实践中可能更快。 The idea is:这个想法是:

build a min-heap of size k from the first k elements in vec
for each following element
    if the element is larger than the smallest element on the heap
        remove the smallest element from the heap
        add the new element to the heap
return element at the top of the heap

This is O(k) to build the initial heap.这是 O(k) 来构建初始堆。 Then it's O((nk) log k) in the worst case for the remaining items.那么剩下的项目在最坏的情况下是 O((nk) log k)。 The worst case occurs when the initial vector is in ascending order.最坏的情况发生在初始向量按升序排列时。 That doesn't happen very often.这并不经常发生。 In practice, a small percentage of items are added to the heap, so you don't have to do all those removals and insertions.实际上,一小部分项目被添加到堆中,因此您不必执行所有这些删除和插入操作。

Some heap implementations have a heap_replace method that combines the two steps of removing the top element and adding the new element.一些堆实现有一个heap_replace方法,它结合了移除顶部元素和添加新元素这两个步骤。 That reduces the complexity by a constant factor.这将复杂性降低了一个常数因子。 (ie rather than an O(log k) removal followed by an O(log k) insertion, you get an constant time replacement of the top element, followed by an O(log k) sifting it down the heap). (即不是在 O(log k) 删除之后再进行 O(log k) 插入,而是对顶部元素进行恒定时间替换,然后是 O(log k) 将其从堆中筛选出来)。

This is heap solution for java.这是java的堆解决方案。 We remove all elements which are less than kth element from the min heap.我们从最小堆中删除所有小于第 k 个元素的元素。 After that we will have kth largest element at the top of the min heap.之后,我们将在最小堆顶部拥有第 k 个最大元素。

class Solution {
    int kLargest(int[] arr, int k) {
        
        PriorityQueue<Integer> heap = new PriorityQueue<>((a, b)-> Integer.compare(a, b));
        for(int a : arr) {
            heap.add(a);
            if(heap.size()>k) {
                // remove smallest element in the heap
                heap.poll();
            }
        }
        // return kth largest element
        return heap.poll();
    }
}

The worst case time complexity will be O(N logK) where N is total no of elements.最坏情况的时间复杂度将是 O(N logK),其中 N 是元素的总数。 You will be using 1 heapify operation when inserting initial k elements in heap.在堆中插入初始 k 个元素时,您将使用 1 个 heapify 操作。 After that you'll be using 2 operations(1 insert and 1 remove).之后,您将使用 2 次操作(1 次插入和 1 次删除)。 So this makes the worst case time complexity O(N logK).所以这使得最坏情况的时间复杂度为 O(N logK)。 You can improve it with some other methods and bring the average case time complexity of heap update to Θ(1).您可以使用其他一些方法对其进行改进,并将堆更新的平均案例时间复杂度提高到 Θ(1)。 Read this for more info.阅读内容以获取更多信息。


Quickselect: Θ(N)快速选择:Θ(N)

If you're looking for a faster solution on average.如果您正在寻找平均速度更快的解决方案。 Quickselect algorithm which is based on quick sort is a good option.基于快速排序的快速选择算法是一个不错的选择。 It provides average case time complexity of O(N) and O(1) space complexity.它提供了 O(N) 和 O(1) 空间复杂度的平均案例时间复杂度。 Of course worst case time complexity is O(N^2) however randomized pivot(used in following code) yields very low probability for such scenario.当然,最坏情况的时间复杂度是 O(N^2),但是随机枢轴(在下面的代码中使用)在这种情况下产生的概率非常低。 Following is code for quickselect algo for finding kth largest element.以下是用于查找第 k 个最大元素的快速选择算法的代码。

class Solution {
    public int findKthLargest(int[] nums, int k) {
        return quickselect(nums, k);
    }
     
    private int quickselect(int[] nums, int k) {
        int n = nums.length;
        int start = 0, end = n-1;
        while(start<end) {
            int ind = partition(nums, start, end);
            if(ind == n-k) {
                return nums[ind];
            } else if(ind < n-k) {
                start = ind+1;
            } else {
                end = ind-1;
            }
        }
        return nums[start];
    }
    
    private int partition(int[] nums, int start, int end) {
        int pivot = start + (int)(Math.random()*(end-start));
        swap(nums, pivot, end);
        
        int left=start;
        for(int curr=start; curr<end; curr++) {
            if(nums[curr]<nums[end]) {
                swap(nums, left, curr);
                left++;
            }
        }
        swap(nums, left, end);
        return left;
    }
    
    private void swap(int[] nums, int i, int j) {
        int temp = nums[i];
        nums[i] = nums[j];
        nums[j] = temp;
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在两个排序数组的并集中找到第k个最大元素? - How to find the kth largest element in the union of two sorted arrays? 数组的第 k 个最小元素,中位数的中位数不满足所需的时间复杂度 - kth smallest element of an array, median of medians not fulfilling required time complexity 在数组中找到第K个最大的整数 - Find the Kth largest int in array 使用最大堆解决“查找数组中第 K 个最大数”问题的时间复杂度是多少? - What is the time complexity of using max heap to solve "Find the K-th largest number in the array" problem? 在扩展字符串中查找第k个元素 - Find kth element in an expanding string 在网格中找到第 K 个最小元素? - Find the Kth min element in the grid? 215. 数组中的第 K 个最大元素 C++ 解决方案不起作用 - 215. Kth Largest Element in an Array C++ Solution Not Working 具有 O(m (log n + log m)) 时间复杂度的算法,用于在 n*m 矩阵中查找第 k 个最小元素,每行排序? - Algorithm with O(m (log n + log m)) time complexity for finding kth smallest element in n*m matrix with each row sorted? O(klogn)时间算法从Fenwick树中找到第k个最小元素 - O(klogn) time algorithm to find kth smallest element from a Fenwick-Tree 如何在哈希图中找到第k个min元素 - How to find the kth min element in a hashmap
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM