简体繁体 English

这个算法不会在O（m log n）中运行吗？

[英]Wouldn't this algorithm run in O(m log n)?

原文 2015-03-10 20:11:47 3 2 java/ algorithm/ runtime/ big-o/ time-complexity

I am working on an interview question from Glassdoor Software Engineer 我正在处理Glassdoor软件工程师的面试问题

The question is 问题是

Given a list of one million numbers, how will you find the top n numbers from the list in an efficient way 给定一百万个数字的列表，您将如何以有效的方式从列表中找到前n个数字

Here is a solution an author gave from same link 这是作者从同一链接给出的解决方案

create a min heap 创建一个最小堆
take first n of m elements and place in the heap (O(n)) 取m个元素中的前n个并放入堆中（O（n））
for each (mn) remaining elements, if it is greater than find-min of the heap, insert into heap and delete min. 对于每个（mn）剩余元素，如果它大于堆的find-min，则插入堆并删除min。 (worst case O((mn)log n ) if the list is sorted. （如果列表已排序，则最坏情况为O（（mn）log n ）。

net result is you can do this in O(n) memory usage and worst-case O((mn)logn) runtime. 最终结果是你可以在O（n）内存使用和最坏情况下的O（（mn）logn）运行时执行此操作。

I agree with the author's algorithm and the author's assessment of the space complexity of this algorithm. 我同意作者的算法和作者对该算法的空间复杂性的评估。 What I have an issue with is the author's analysis of the runtime for insertion into heap and overall time 我遇到的问题是作者对插入堆中的运行时和总体时间的分析

For the step "take first n of m elements and place in the heap", wouldn't that run in O(nlogn) ? 对于“取m个元素中的前n个并放置在堆中”的步骤，不会在O（nlogn）中运行吗？ At least according to my class notes Heap Add , insertion would be O(logn) and because you are inserting n elements, the runtime of that whole step would be O(nlogn) . 至少根据我的课堂笔记Heap Add ，插入将是O（logn） ，因为你要插入n个元素，整个步骤的运行时将是O（nlogn） 。

Taking that into consideration, wouldn't the overall runtime of this entire algorithm be, using big oh addition from Big Oh Addition 考虑到这一点，整个算法的整体运行时间不会是，使用Big Oh Addition的大量添加

O(nlogn + (m-n)logn) = O(mlogn)

2 个解决方案

Using that approach to building a heap, yes, but there is an O(n) algorithm for converting an array to a heap. 使用该方法构建堆，是的，但是有一个O（n）算法用于将数组转换为堆。 See http://en.wikipedia.org/wiki/Binary_heap#Building_a_heap for details. 有关详细信息，请参见http://en.wikipedia.org/wiki/Binary_heap#Building_a_heap 。

That said, an O(m) time, O(n) memory solution exists for this problem, implemented by eg Guava's Ordering.leastOf . 也就是说，这个问题存在O（m）时间，O（n）内存解决方案，由例如Guava的Ordering.leastOf 。 One implementation is 一个实现是

create a buffer, an array of size 2n 创建一个大小为2n的缓冲区
loop through the original array, adding elements to the buffer 循环遍历原始数组，向缓冲区添加元素
whenever the buffer is full, use an O(n) quickselect to keep only the highest n elements from the buffer and discard the rest. 只要缓冲区已满，使用O（n）quickselect只保留缓冲区中最高的n个元素，并丢弃其余的元素。
use one final quickselect to extract the highest n elements from the buffer 使用最后一个quickselect从缓冲区中提取最高n个元素

This requires O(m/n) quickselects, each of which take O(n), for O(m) time total. 这需要O（m / n）个快速选择，每个选择O（n），总时间为O（m）。

For the step "take first n of m elements and place in the heap", wouldn't that run in O(nlogn)? 对于“取m个元素中的前n个并放置在堆中”的步骤，不会在O（nlogn）中运行吗？

Not necessarily. 不必要。 You can create a heap from n elements in O(n) . 您可以从O(n) n元素创建堆。 See here for how that can be achieved. 请参阅此处了解如何实现这一目标。

So you'd have O(n + (m - n)log n) = O((m - n)log n) = O(m log n) . 所以你有O(n + (m - n)log n) = O((m - n)log n) = O(m log n) 。 The last step is correct only if n is considered to be a constant, otherwise you should keep it as m - n , as the author has. 只有当n被认为是常数时，最后一步才是正确的，否则你应该像作者那样将它保持为m - n 。

Followup question: can you solve the whole problem in O(m) ? 后续问题：你能解决O(m)的整个问题吗？