简体   繁体   English

Python中numpy.random和random.random的区别

[英]Differences between numpy.random and random.random in Python

I have a big script in Python.我在 Python 中有一个大脚本。 I inspired myself in other people's code so I ended up using the numpy.random module for some things (for example for creating an array of random numbers taken from a binomial distribution) and in other places I use the module random.random .我在其他人的代码中启发了自己,所以我最终将numpy.random模块用于某些事情(例如,用于创建从二项分布中获取的随机数数组),在其他地方我使用模块random.random

Can someone please tell me the major differences between the two?有人可以告诉我两者之间的主要区别吗? Looking at the doc webpage for each of the two it seems to me that numpy.random just has more methods, but I am unclear about how the generation of the random numbers is different.查看两者的文档网页,在我看来numpy.random只是有更多方法,但我不清楚随机数的生成有何不同。

The reason why I am asking is because I need to seed my main program for debugging purposes.我问的原因是因为我需要为我的主程序播种以进行调试。 But it doesn't work unless I use the same random number generator in all the modules that I am importing, is this correct?但它不起作用,除非我在我正在导入的所有模块中使用相同的随机数生成器,这是正确的吗?

Also, I read here, in another post, a discussion about NOT using numpy.random.seed() , but I didn't really understand why this was such a bad idea.另外,我在另一篇文章中读到了关于不使用numpy.random.seed()的讨论,但我真的不明白为什么这是一个坏主意。 I would really appreciate if someone explain me why this is the case.如果有人向我解释为什么会这样,我将不胜感激。

You have made many correct observations already!您已经做出了许多正确的观察!

Unless you'd like to seed both of the random generators, it's probably simpler in the long run to choose one generator or the other.除非您想为两个随机生成器播种,否则从长远来看,选择一个或另一个生成器可能更简单。 But if you do need to use both, then yes, you'll also need to seed them both, because they generate random numbers independently of each other.但是,如果您确实需要同时使用两者,那么是的,您还需要同时为它们播种,因为它们彼此独立地生成随机数。

For numpy.random.seed() , the main difficulty is that it is not thread-safe - that is, it's not safe to use if you have many different threads of execution , because it's not guaranteed to work if two different threads are executing the function at the same time.对于numpy.random.seed() ,主要困难在于它不是线程安全的 - 也就是说,如果您有许多不同的执行线程,使用它是不安全的,因为如果两个不同的线程正在执行,它不能保证工作function 同时。 If you're not using threads, and if you can reasonably expect that you won't need to rewrite your program this way in the future, numpy.random.seed() should be fine.如果您不使用线程,并且可以合理地预期将来不需要以这种方式重写程序,则numpy.random.seed()应该没问题。 If there's any reason to suspect that you may need threads in the future, it's much safer in the long run to do as suggested, and to make a local instance of the numpy.random.Random class .如果有任何理由怀疑您将来可能需要线程,从长远来看,按照建议进行操作会更安全,并制作numpy.random.Random class 的本地实例 As far as I can tell, random.random.seed() is thread-safe (or at least, I haven't found any evidence to the contrary).据我所知, random.random.seed()是线程安全的(或者至少,我没有发现任何相反的证据)。

The numpy.random library contains a few extra probability distributions commonly used in scientific research, as well as a couple of convenience functions for generating arrays of random data. numpy.random库包含一些在科学研究中常用的额外概率分布,以及一些用于生成随机数据 arrays 的便利函数。 The random.random library is a little more lightweight, and should be fine if you're not doing scientific research or other kinds of work in statistics. random.random库更轻量级,如果您不进行科学研究或其他类型的统计工作,应该没问题。

Otherwise, they both use the Mersenne twister sequence to generate their random numbers, and they're both completely deterministic - that is, if you know a few key bits of information, it's possible to predict with absolute certainty what number will come next .否则,它们都使用梅森捻线器序列来生成它们的随机数,而且它们都是完全确定的——也就是说,如果你知道一些关键信息,就可以绝对确定地预测接下来会出现什么数字 For this reason, neither numpy.random nor random.random is suitable for any serious cryptographic uses .因此,numpy.random 和 random.random 都不适合任何严重的加密用途 But because the sequence is so very very long, both are fine for generating random numbers in cases where you aren't worried about people trying to reverse-engineer your data.但是因为这个序列非常长,所以在你不担心人们试图对你的数据进行逆向工程的情况下,两者都可以生成随机数。 This is also the reason for the necessity to seed the random value - if you start in the same place each time, you'll always get the same sequence of random numbers!这也是需要播种随机值的原因——如果你每次都从同一个地方开始,你总是会得到相同的随机数序列!

As a side note, if you do need cryptographic level randomness, you should use the secrets module, or something like Crypto.Random if you're using a Python version earlier than Python 3.6.附带说明一下,如果您确实需要加密级别的随机性,则应该使用secrets模块,或者如果您使用的是早于 Python 3.6 的 Python 版本,则应使用Crypto.Random之类的东西。

From Python for Data Analysis , the module numpy.random supplements the Python random with functions for efficiently generating whole arrays of sample values from many kinds of probability distributions. From Python for Data Analysis , the module numpy.random supplements the Python random with functions for efficiently generating whole arrays of sample values from many kinds of probability distributions.

By contrast, Python's built-in random module only samples one value at a time, while numpy.random can generate very large sample faster.相比之下,Python 内置的random模块一次只采样一个值,而numpy.random可以更快地生成非常大的样本。 Using IPython magic function %timeit one can see which module performs faster:使用 IPython 魔法 function %timeit可以看到哪个模块执行得更快:

In [1]: from random import normalvariate
In [2]: N = 1000000

In [3]: %timeit samples = [normalvariate(0, 1) for _ in xrange(N)]
1 loop, best of 3: 963 ms per loop

In [4]: %timeit np.random.normal(size=N)
10 loops, best of 3: 38.5 ms per loop

The source of the seed and the distribution profile used are going to affect the outputs - if you are looking for cryptgraphic randomness, seeding from os.urandom() will get nearly real random bytes from device chatter (ie ethernet or disk) (ie /dev/random on BSD)种子的来源和使用的分布配置文件将影响输出 - 如果您正在寻找加密随机性,来自 os.urandom() 的种子将从设备抖动(即以太网或磁盘)获得几乎真实的随机字节(即 / BSD 上的开发/随机)

this will avoid you giving a seed and so generating determinisitic random numbers.这将避免您提供种子并因此生成确定性随机数。 However the random calls then allow you to fit the numbers to a distribution (what I call scientific random ness - eventually all you want is a bell curve distribution of random numbers, numpy is best at delviering this.然而,随机调用然后允许您将数字拟合到分布(我称之为科学随机性 - 最终您想要的只是随机数的钟形曲线分布,numpy 最擅长解决这个问题。

SO yes, stick with one generator, but decide what random you want - random, but defitniely from a distrubtuion curve, or as random as you can get without a quantum device.所以,是的,坚持使用一个发生器,但决定你想要什么随机 - 随机,但绝对来自分布曲线,或者在没有量子设备的情况下尽可能随机。

It surprised me the randint(a, b) method exists in both numpy.random and random , but they have different behaviors for the upper bound.让我感到惊讶的是numpy.randomrandom中都存在randint(a, b)方法,但是它们的上限行为不同。

random.randint(a, b) returns a random integer N such that a <= N <= b . random.randint(a, b)返回一个随机 integer N 使得a <= N <= b Alias for randrange(a, b+1) . randrange(a, b+1)别名。 It has b inclusive.它包含b random documentation随机文件

However if you call numpy.random.randint(a, b) , it will return low(inclusive) to high(exclusive).但是,如果您调用numpy.random.randint(a, b) ,它将返回低(包括)到高(不包括)。 Numpy documentation Numpy 文档

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM