简体   繁体   English

这个算法不会在O(m log n)中运行吗?

[英]Wouldn't this algorithm run in O(m log n)?

I am working on an interview question from Glassdoor Software Engineer 我正在处理Glassdoor软件工程师的面试问题

The question is 问题是

Given a list of one million numbers, how will you find the top n numbers from the list in an efficient way 给定一百万个数字的列表,您将如何以有效的方式从列表中找到前n个数字

Here is a solution an author gave from same link 这是作者从同一链接给出的解决方案

  1. create a min heap 创建一个最小堆
  2. take first n of m elements and place in the heap (O(n)) 取m个元素中的前n个并放入堆中(O(n))
  3. for each (mn) remaining elements, if it is greater than find-min of the heap, insert into heap and delete min. 对于每个(mn)剩余元素,如果它大于堆的find-min,则插入堆并删除min。 (worst case O((mn)log n ) if the list is sorted. (如果列表已排序,则最坏情况为O((mn)log n )。

net result is you can do this in O(n) memory usage and worst-case O((mn)logn) runtime. 最终结果是你可以在O(n)内存使用和最坏情况下的O((mn)logn)运行时执行此操作。

I agree with the author's algorithm and the author's assessment of the space complexity of this algorithm. 我同意作者的算法和作者对该算法的空间复杂性的评估。 What I have an issue with is the author's analysis of the runtime for insertion into heap and overall time 我遇到的问题是作者对插入堆中的运行时和总体时间的分析

For the step "take first n of m elements and place in the heap", wouldn't that run in O(nlogn) ? 对于“取m个元素中的前n个并放置在堆中”的步骤,不会在O(nlogn)中运行吗? At least according to my class notes Heap Add , insertion would be O(logn) and because you are inserting n elements, the runtime of that whole step would be O(nlogn) . 至少根据我的课堂笔记Heap Add ,插入将是O(logn) ,因为你要插入n个元素,整个步骤的运行时将是O(nlogn)

Taking that into consideration, wouldn't the overall runtime of this entire algorithm be, using big oh addition from Big Oh Addition 考虑到这一点,整个算法的整体运行时间不会是,使用Big Oh Addition的大量添加

O(nlogn + (m-n)logn) = O(mlogn)

Using that approach to building a heap, yes, but there is an O(n) algorithm for converting an array to a heap. 使用该方法构建堆,是的,但是有一个O(n)算法用于将数组转换为堆。 See http://en.wikipedia.org/wiki/Binary_heap#Building_a_heap for details. 有关详细信息,请参见http://en.wikipedia.org/wiki/Binary_heap#Building_a_heap

That said, an O(m) time, O(n) memory solution exists for this problem, implemented by eg Guava's Ordering.leastOf . 也就是说,这个问题存在O(m)时间,O(n)内存解决方案,由例如Guava的Ordering.leastOf One implementation is 一个实现是

  • create a buffer, an array of size 2n 创建一个大小为2n的缓冲区
  • loop through the original array, adding elements to the buffer 循环遍历原始数组,向缓冲区添加元素
  • whenever the buffer is full, use an O(n) quickselect to keep only the highest n elements from the buffer and discard the rest. 只要缓冲区已满,使用O(n)quickselect只保留缓冲区中最高的n个元素,并丢弃其余的元素。
  • use one final quickselect to extract the highest n elements from the buffer 使用最后一个quickselect从缓冲区中提取最高n个元素

This requires O(m/n) quickselects, each of which take O(n), for O(m) time total. 这需要O(m / n)个快速选择,每个选择O(n),总时间为O(m)。

For the step "take first n of m elements and place in the heap", wouldn't that run in O(nlogn)? 对于“取m个元素中的前n个并放置在堆中”的步骤,不会在O(nlogn)中运行吗?

Not necessarily. 不必要。 You can create a heap from n elements in O(n) . 您可以从O(n) n元素创建堆。 See here for how that can be achieved. 请参阅此处了解如何实现这一目标。

So you'd have O(n + (m - n)log n) = O((m - n)log n) = O(m log n) . 所以你有O(n + (m - n)log n) = O((m - n)log n) = O(m log n) The last step is correct only if n is considered to be a constant, otherwise you should keep it as m - n , as the author has. 只有当n被认为是常数时,最后一步才是正确的,否则你应该像作者那样将它保持为m - n

Followup question: can you solve the whole problem in O(m) ? 后续问题:你能解决O(m)的整个问题吗?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM