简体   繁体   English

为什么(在 Python 中)random.randint 比 random.random 慢得多?

[英]Why (in Python) is random.randint so much slower than random.random?

I got curious about the relative speeds of some random integer generating code.我对一些随机 integer 生成代码的相对速度感到好奇。 I wrote the following to check it out:我写了以下内容来检查它:

from random import random
from random import choice
from random import randint
from math import floor
import time

def main():
    times = 1000000

    startTime = time.time()
    for i in range(times):
        randint(0,9)
    print(time.time()-startTime)

    startTime = time.time()
    for i in range(times):
        choice([0,1,2,3,4,5,6,7,8,9])
    print(time.time()-startTime)

    startTime = time.time()
    for i in range(times):
        floor(10*random())##generates random integers in the same range as randint(0,9)
    print(time.time()-startTime)

main()

The results of one trial of this code were该代码的一次试验的结果是

0.9340872764587402 0.9340872764587402

0.6552846431732178 0.6552846431732178

0.23188304901123047 0.23188304901123047

Even after executing multiplication and math.floor, the final way of generating integers was by far fastest.即使在执行了乘法和 math.floor 之后,生成整数的最终方法也是迄今为止最快的。 Messing with the size of the range from which numbers were generated didn't change anything.弄乱生成数字的范围的大小并没有改变任何东西。

So, why is random way faster than randint?那么,为什么 random 方式比 randint 快? and is there any reason why (besides ease of use, readability, and not inviting mistakes) that one would prefer randint to random (eg, randint produces more random pseudo-random integers)?有什么理由(除了易用性、可读性和不引起错误)人们更喜欢 randint 而不是随机的(例如,randint 产生更多的随机伪随机整数)? If floor(x*random()) feels not readable enough but you want faster code, should you go for a specialized routine?如果floor(x*random())感觉可读性不够,但您想要更快的代码,您是否应该使用 go 进行专门的例程?

def myrandint(low,high):   ###still about 1.6 longer than the above, but almost 2.5 times faster than random.randint
    return floor((high-low+1)*random())+low  ##returns a random integer between low and high, inclusive. Results may not be what you expect if int(low) != low, etc. But the numpty who writes 'randint(1.9,3.2)' gets what they deserve.

Before I answer your question (and don't worry, I do get there), take note of the common programmer's idiom:在我回答你的问题之前(别担心,我确实做到了),请注意常见的程序员的习语:

Premature optimization is the root of all evil.过早的优化是万恶之源。

While this isn't always the case , don't worry about micro-optimizations unless you need them.虽然情况并非总是如此,但除非您需要,否则不要担心微优化。

This goes double for Python: if you're writing something where speed is critical, you'll usually want write it in a language that will run faster, like C.这对于 Python 来说是双倍的:如果您正在编写速度至关重要的东西,您通常会希望使用运行速度更快的语言编写它,例如 C。 You can then write Python bindings for that C code if you want to use Python for the non-critical parts of your application (as is the case with, for example, NumPy).然后,如果您想将 Python 用于应用程序的非关键部分,则可以为该 C 代码编写 Python 绑定,例如,Num。

Instead of focusing on making individual expressions or functions in your code run as fast as possible, focus on algorithms you use and the the overall structure of your code (and on making it readable, but you are already aware of that).与其专注于使代码中的单个表达式或函数尽可能快地运行,不如专注于您使用的算法和代码的整体结构(并使其具有可读性,但您已经意识到这一点)。 Then, when your application starts running slowly, you can profile it to figure out what parts take the most time, and improve only those parts.然后,当您的应用程序开始缓慢运行时,您可以对其进行分析以找出哪些部分花费的时间最多,并仅改进这些部分。

The changes will be easier to make to well-structured, readable code and optimizing the actual bottlenecks will generally give a much better speedup-to-time-coding ratio than most micro-optimizations.对结构良好、可读性强的代码进行更改将更容易,并且优化实际瓶颈通常会比大多数微优化提供更好的加速时间编码比。 The time spent wondering which of two expressions runs faster is time you could have spent getting other things done.花在思考两个表达式中哪个运行得更快的时间是你可以花时间完成其他事情的时间。

As an exception, I'd say learning why one option is faster than the other is sometimes worth the time, because then you can incorporate that more general knowledge into your future programming, letting you make quicker calls without worrying about the details.作为一个例外,我会说学习为什么一个选项比另一个更快是值得的,因为这样你就可以将更多的通用知识融入你未来的编程中,让你更快地调用而不用担心细节。

But enough about why we shouldn't waste time worrying about speed, let's talk about speed.但是关于为什么我们不应该浪费时间担心速度已经够多了,让我们来谈谈速度。


Taking a look at the source of the random module (for CPython 3.7.4), this line from the end of the opening comment provides a short answer:查看random模块的来源(对于 CPython 3.7.4),开头评论末尾的这一行提供了一个简短的答案:

* The random() method is implemented in C, executes in a single Python step,
  and is, therefore, threadsafe.

The first statement is the ones that matters the most to us.第一个陈述是对我们最重要的陈述。 random is a python binding for a C function, so the complexity of its operation runs at the blinding speed of machine code rather than the relatively slow speed of Python. random is a python binding for a C function, so the complexity of its operation runs at the blinding speed of machine code rather than the relatively slow speed of Python.

randint , on the other hand, is implemented in Python, and suffers a significant speed penalty for it.另一方面, randint在 Python 中实现的,因此会受到显着的速度损失。 randint calls randrange , which ensures that the range's bounds (and step size) are integers, that the range isn't empty, and that the step size isn't zero, before calling getrandbits , which is implemented in C. randint调用randrange ,它确保范围的边界(和步长)是整数,范围不为空,步长不为零,然后调用getrandbits ,这在 C 中实现。

This alone produces the majority of randint 's slowness.仅此一项就产生了randint的大部分缓慢。 However, there is one more variable in play.然而,还有一个变量在起作用。

Going a little deeper, into the internal function _randbelow , it turns out that the algorithm for getting a random number between 0 and n is very straightforward: it gets the number of bits in n , then generates that many bits at random repeatedly until the resulting number is no greater than n .再深入一点,进入内部 function _randbelow ,事实证明,获取 0 到n之间的随机数的算法非常简单:它获取n中的位数,然后随机生成那么多位,直到得到number 不大于n

On average (across all possible values of n ), this has little effect, but comparing the extremes, it is noticeable.平均而言(在n的所有可能值中),这几乎没有影响,但比较极端情况,这是显而易见的。

I wrote a function that tests the impact of that loop.我写了一个 function来测试那个循环的影响。 Here are the results:结果如下:

bits   2 ** (n - 1)   (2 ** n) - 1   ratio
   1   1.122583558    1.06002008     1.059021031
   2   1.083326405    1.008945953    1.0737209479
   4   1.071182065    0.900332951    1.1897621472
   8   1.074771422    0.91913078     1.1693345989
  16   1.144971642    0.920407928    1.2439828115
  32   1.134300228    0.927834944    1.2225237208
  64   1.244957927    0.96199336     1.2941439918
 128   1.293113046    1.00158057     1.2910724157
 256   1.366579178    1.069473996    1.2778049612
 512   1.629956014    1.190126045    1.3695658715

The first column is the number of bits, the second and third are the average time (in microseconds) to find a random integer with that many bits, in microseconds, over 1 000 000 runs.第一列是位数,第二和第三列是平均时间(以微秒为单位),以找到具有这么多位的随机 integer,以微秒为单位,超过 1 000 000 次运行。 The last column is the ratio of the second and third columns.最后一列是第二列和第三列的比率。

You'll notice that average runtimes for the largest number with a given bit length are larger than for the smallest number with that bit length.您会注意到具有给定位长度的最大数字的平均运行时间大于具有该位长度的最小数字的平均运行时间。 This is because of that loop:这是因为那个循环:

When looking for a n -bit number no greater than the largest n -bit number, only one attempt is ever needed.当寻找不大于最大n位数的n位数时,只需要一次尝试。 But to find a number smaller than the smallest (2 n −1 is a single 1-bit followed by n −1 0-bits), half of the attempts fail.但是要找到一个小于最小的数(2 n -1是单个 1 位,后跟n -1 个 0 位),一半的尝试会失败。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM