简体   繁体   English

在 Python 中查找第 K 个最大元素的总体复杂性

[英]Overall complexity of finding Kth Largest element in Python

I was solving this leetcode problem link , and found an amazing solution using heapq module, the the running time of this function is very less.我正在解决这个leetcode问题 链接,并使用heapq模块找到了一个惊人的解决方案,该功能的运行时间非常少。 This is below program:这是下面的程序:

from itertools import islice
import heapq

def nlargest(n, iterable):
    """Find the n largest elements in a dataset.

    Equivalent to:  sorted(iterable, reverse=True)[:n]
    """
    if n < 0:
        return []
    it = iter(iterable)
    result = list(islice(it, n))
    if not result:
        return result
    heapq.heapify(result)
    _heappushpop = heapq.heappushpop
    for elem in it:
        _heappushpop(result, elem)
    result.sort(reverse=True)
    return result

print nlargest(5, [10, 122, 2, 3, 3, 4, 5, 5, 10, 12, 23, 18, 17, 15, 100, 101])

This algorithm is really clever and you can also do the visualize here LINK这个算法真的很聪明,你也可以在这里做可视化链接

But I am having a hard time understanding the time complexity of the whole algorithm.但是我很难理解整个算法的时间复杂度。 Here is my analysis, and please correct me if I am wrong!以上是我的分析,如有不对请指正!

Time Complexity :时间复杂度:

 result = list(islice(it, n)) - > O(n) heapq.heapify(result) -> O(len(result)) for elem in it: _heappushpop(result, elem) -> I am confused at this part result.sort(reverse=True) -> O(len(result)*log(len(result)))

Could anyone help me understand the overall time complexity of the algorithm.谁能帮我理解算法的整体时间复杂度。

So you have two relevant paramaters here: n (the number of items to return), and, say, M (the number of items in the dataset).因此,这里有两个相关参数: n (要返回的项目数),以及M (数据集中的项目数)。

islice(it, n) -- O(n)
heapify(result) -- O(n), because len(result)=n
for elem in it: _heappushpop(result, elem) -- performing M-N times an operation of O(logn), because len(result) remains n, i.e. (M-N)*logn
result.sort(reverse=True) -- O(n*logn)

Overall:总体:

n + n + (M-n)*logn + n*logn

Resulting with O(M*logn) .结果为O(M*logn) You can easily see the dominant part is the heappushpop loop (assuming M>>n, otherwise the problem is less interesting, because the solution is more or less reduced to sorting).你可以很容易地看到占主导地位的部分是 heappushpop 循环(假设 M>>n,否则问题就不那么有趣了,因为解决方案或多或少地归结为排序)。


It is worth pointing out there are l inear-time algorithms for solving this problem, so if your dataset is very big, it is worth checking them out.值得指出的是,有一些线性时间算法可以解决这个问题,所以如果你的数据集非常大,那么值得一试。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM