简体   繁体   English

给出概率界限的大O符号?

[英]Big O-notation given probabalistic bounds?

I'm trying to estimate the runtime complexity of the following algorithm: 我正在尝试估计以下算法的运行时复杂性:

for i in range(0, len(arr)):
    for j in range(i+1, len(arr)):
        if distance(arr[i], arr[j]) > 2:
            pass

The complexity of the distance function is min(len(arg1), len(arg2)) . 距离函数的复杂度是min(len(arg1), len(arg2)) In theory, the maximum length of the arguments can be upto N, but in practice it is usually note more than 20% of N. 理论上,参数的最大长度可以达到N,但实际上它通常超过N的20%。

From this, I can estimate the runtime function as: 由此,我可以将运行时函数估计为:

f(N) = N*(N/2)*(N*.2) f(N)= N *(N / 2)*(N * .2)

Is that O(N^2) in the big O notation or O(N^3)? O(N ^ 2)是大O符号还是O(N ^ 3)? If it's O(n^3), then how does one justify that the runtime is always going to be closer to O(n^2) than to O(n^3) in practice? 如果它是O(n ^ 3),那么在实践中如何证明运行时总是更接近O(n ^ 2)而不是O(n ^ 3)?

Thanks 谢谢

You ask: 你问:

If it's O(n^3), then how does one justify that the runtime is always going to be closer to O(n^2) than to O(n^3) in practice? 如果它是O(n ^ 3),那么在实践中如何证明运行时总是更接近O(n ^ 2)而不是O(n ^ 3)?

The answer is that "closer" doesn't matter. 答案是“更接近”并不重要。 Only "bigger than" and "smaller than" matter. 只有“大于”和“小于”才重要。

Big O gives an upper bound 大O给出了上限

If the runtime complexity of a procedure eventually exceeds c * n^2 for any constant c or larger and a big enough value n, then it cannot possibly be O(n^2). 如果过程的运行时复杂度最终超过c * n ^ 2对于任何常数c或更大且足够大的值n,则它不可能是O(n ^ 2)。

That's because the big-O operator doesn't give an estimate ; 那是因为big-O运营商没有给出估计 ; it gives an upper bound . 它给出了一个上限 Even a procedure that runs in constant time is still O(n^3). 即使是在恒定时间内运行的程序仍然是O(n ^ 3)。 (It's also O(n^2), O(log(n)), O(n!), and so on). (它也是O(n ^ 2),O(log(n)),O(n!)等等)。 That's because it's smaller than all of those runtimes for some constant multiplier c and large values of n. 这是因为它比一些常数乘数c和n值大的所有运行时间都要小。

A concrete example 一个具体的例子

To make this concrete, consider the following: 为了具体,请考虑以下事项:

>>> def fa(n):
...     return n * n * n // 10
... 
>>> def f2(n):
...     return n * n
... 
>>> def f3(n):
...     return n * n * n
... 

For the above runtimes and small n , fa is still less than or equal to f2 : 对于上述运行时和小nfa仍然小于或等于f2

>>> fa(10), f2(10), f3(10)
(100, 100, 1000)

But if we multiply n by 10, fa exceeds f2 . 但是如果我们将n乘以10,则fa超过f2

>>> fa(100), f2(100), f3(100)
(100000, 10000, 1000000)

And it's not hard to see that even if we boost f2 by a constant multiplier c , we can still find a value of n such that fa(n) is larger. 并且不难看出,即使我们用常数乘数c增加f2 ,我们仍然可以找到n的值,使得fa(n)更大。

>>> def f2_boost(n, c):
...     return f2(n) * c
... 
>>> fa(1000), f2_boost(1000, 10), f3(1000)
(100000000, 10000000, 1000000000)

Why use a constant multiplier? 为什么要使用常数乘数?

You might still find it confusing that a procedure with a runtime of n^3 * 0.1 falls in the same big-O category as a procedure with a runtime of 1000 * n^3. 您可能仍然会发现,运行时为n ^ 3 * 0.1的过程与运行时间为1000 * n ^ 3的过程属于同一大O类别,这一点令人困惑。 After all, the absolute difference between these two runtimes is huge! 毕竟,这两个运行时间之间的绝对差异是巨大的!

This is a bit harder to explain, but it starts to make sense when you remind yourself that big O notation is supposed to describe scaling behavior. 这有点难以解释,但当你提醒自己大O符号应该描述缩放行为时,它就开始变得有意义了。 Or, to put it another way, big O notation is supposed to describe how the runtime varies when we change the size of the units that we use for our measurements. 或者,换句话说,当我们改变用于测量的单位的大小时,大O符号应该描述运行时的变化。

Let's take a concrete example: imagine you want to know the height of a building. 让我们举一个具体的例子:想象你想知道建筑物的高度。 And suppose someone says "oh, it's about 300 meters." 并且假设某人说“哦,它大约300米。” You might feel satisfied by that response; 你可能会对这种反应感到满意; you might not care that it's really 315 meters; 你可能不在乎它真的是315米; 300 is a good enough estimate. 300是一个足够好的估计。 But what if, instead, they said "oh, it's about 300 meters... or was that 300 feet?" 但是,如果相反,他们说“哦,它大约300米......或者那300英尺?” You'd probably feel a lot less satisfied, because 300 meters would be more than three times as high as 300 feet. 您可能会感到不那么满意,因为300米将是300英尺的三倍多。

In computer science, we have exactly that problem when measuring time. 在计算机科学中,我们在测量时间时确实存在这个问题。 In fact, it's even worse. 事实上,情况更糟。 Different computers might be vastly faster or slower than others. 不同的计算机可能比其他计算机更快或更慢。 If we measure time in "number of calculations performed by a computer," then for some computers, we will be measuring time in hundredths of a second, and for other computers, we will be measuring time in billionths of a second. 如果我们在“计算机执行的计算次数”中测量时间,那么对于某些计算机,我们将以百分之一秒为单位测量时间,而对于其他计算机,我们将以十亿分之一秒为单位测量时间。 If we want to describe the behavior of the algorithm in a way that isn't skewed by that huge difference, then we need a measurement that is "scale invariant" -- that is, a measurement that gives the same answer whether we use hundredths of seconds or billionths of seconds as our units. 如果我们想以一种不会被这种巨大差异扭曲的方式来描述算法的行为,那么我们需要一种“尺度不变”的测量 - 也就是说,无论我们使用百分之一,都给出相同答案的测量以秒或十亿分之一秒为单位。

Big O notation provides such a measurement. Big O表示法提供了这样的度量。 It gives us a way to measure runtime without needing to worry so much about the size of the units we use to measure time. 它为我们提供了一种测量运行时间的方法,而无需担心我们用于测量时间的单位的大小。 In essence, saying that an algorithm is O(n^2) is saying that for any unit of time equal to or larger than some value c, there is a corresponding value for n such that our procedure will complete before c * n^2 for all larger values of n. 从本质上讲,算法是O(n ^ 2)说,对于任何等于或大于某个值c的时间单位,n都有一个对应的值,这样我们的过程就会在c * n ^ 2之前完成对于所有较大的n值。

Estimating runtimes 估计运行时间

If you want to talk about estimating runtimes, then you want a measurement called "big theta." 如果你想谈论估计运行时间,那么你需要一个名为“big theta”的测量。 Take a look at this answer for details. 有关详细信息,请查看此答案 In short, big O gives an upper bound for arbitrarily large multiplier c; 简而言之,大O给出了任意大乘数c的上界 ; big omega gives a lower bound for arbitrarily large multiplier c; 大欧米茄给出了任意大乘数c的下界 ; and big theta gives a function that defines both an upper bound and a lower bound, depending on the choice of multiplier c. 而big theta给出了一个定义上限和下限的函数,具体取决于乘数c的选择。

In your case, the big theta value would be O(n^3) because you can choose a constant multiplier c1 such that c1 * n^3 is always larger than n^3 / 10, and you can choose a constant multiplier c2 such that c2 * n^3 is always smaller than n^3 / 10. 在你的情况下,大的θ值将是O(n ^ 3),因为你可以选择一个常数乘数c1,使得c1 * n ^ 3总是大于n ^ 3/10,你可以选择一个常数乘数c2 c2 * n ^ 3总是小于n ^ 3/10。

It's still O(n^3) . 它仍然是O(n^3) It might be tempting to think that O(0.0000001 * n^3) is "better" that O(n^2) . 认为O(0.0000001 * n^3)O(n^2) “更好”可能很诱人。 But if we discuss theorethical complexity of an algorythm, then just assume that n can be as big as 10^100 and you'll always understand that O(n^3) is "worse" in terms of performance. 但是如果我们讨论一个algorythm的理论复杂性,那么假设n可以大到10^100并且你将永远理解O(n^3)在性能方面“更差”。

Let len(arr) = N . len(arr)= N.

Since primitive statments is inside the second loop, let us see how many times does it run. 由于原始状态在第二个循环内,让我们看看它运行了多少次。

The inner loop runs 内循环运行

N-1 times for the first time 第一次N-1

N-2 times for the second time. 第二次N-2次。

N-3 times for the third time. N-3次第三次。

... ... ... ...... ......

1 times for the ( N-1 )th time. 1次的(N-1)次的时间。

clearly the total sum would be = ( N-1 )( N )/ 2 = X (say). 显然总和将是=( N-1 )( N )/ 2 = X (比如说)。 The distance function is executed X times and in asymptotic analysis , we consider the worst case, which means that complexity of the distance function is = O( N ). 距离函数执行X次并且在渐近分析中 ,我们考虑最坏的情况,这意味着距离函数的复杂度是= O( N )。

Hence T(N) = (( N-1 )( N )/ 2 ) N = Y (say) 因此T(N)=(( N-1 )( N )/ 2N = Y (比如说)

Using definition of Big O 使用Big O的定义

Y <= c ( N ^3), for all n >= 1, and c = 1. Y <= cN ^ 3),对于所有n > = 1,并且c = 1。

Therefore T( N ) = O( N ^3) 因此T( N )= O( N ^ 3)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM