简体   繁体   中英

Wouldn't this algorithm run in O(m log n)?

I am working on an interview question from Glassdoor Software Engineer

The question is

Given a list of one million numbers, how will you find the top n numbers from the list in an efficient way

Here is a solution an author gave from same link

  1. create a min heap
  2. take first n of m elements and place in the heap (O(n))
  3. for each (mn) remaining elements, if it is greater than find-min of the heap, insert into heap and delete min. (worst case O((mn)log n ) if the list is sorted.

net result is you can do this in O(n) memory usage and worst-case O((mn)logn) runtime.

I agree with the author's algorithm and the author's assessment of the space complexity of this algorithm. What I have an issue with is the author's analysis of the runtime for insertion into heap and overall time

For the step "take first n of m elements and place in the heap", wouldn't that run in O(nlogn) ? At least according to my class notes Heap Add , insertion would be O(logn) and because you are inserting n elements, the runtime of that whole step would be O(nlogn) .

Taking that into consideration, wouldn't the overall runtime of this entire algorithm be, using big oh addition from Big Oh Addition

O(nlogn + (m-n)logn) = O(mlogn)

Using that approach to building a heap, yes, but there is an O(n) algorithm for converting an array to a heap. See http://en.wikipedia.org/wiki/Binary_heap#Building_a_heap for details.

That said, an O(m) time, O(n) memory solution exists for this problem, implemented by eg Guava's Ordering.leastOf . One implementation is

  • create a buffer, an array of size 2n
  • loop through the original array, adding elements to the buffer
  • whenever the buffer is full, use an O(n) quickselect to keep only the highest n elements from the buffer and discard the rest.
  • use one final quickselect to extract the highest n elements from the buffer

This requires O(m/n) quickselects, each of which take O(n), for O(m) time total.

For the step "take first n of m elements and place in the heap", wouldn't that run in O(nlogn)?

Not necessarily. You can create a heap from n elements in O(n) . See here for how that can be achieved.

So you'd have O(n + (m - n)log n) = O((m - n)log n) = O(m log n) . The last step is correct only if n is considered to be a constant, otherwise you should keep it as m - n , as the author has.

Followup question: can you solve the whole problem in O(m) ?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM