简体   繁体   English

Python 中 numpy.random 和 random.random 之间的性能差异

[英]Performance difference between numpy.random and random.random in Python

I want to see what random number generator package is faster in my neural network.我想看看我的神经网络中哪个随机数生成器包更快。

I am currently changing a code from github, in which both numpy.random and random packages are used to generate random integers, random choices, random samples etc.我目前正在更改 github 中的代码,其中 numpy.random 和 random 包都用于生成随机整数、随机选择、随机样本等。

The reason that I am changing this code is that for research purposes I would like to set a global seed to be able to compare accuracy performance for different settings of hyperparameters.我更改此代码的原因是出于研究目的,我想设置一个全局种子,以便能够比较不同超参数设置的准确度性能。 The problem is that at this moment I have to set 2 global seeds, both for the random package and for the numpy package.问题是此时我必须为 random 包和 numpy 包设置 2 个全局种子。 Ideally, I would like to set only one seed as drawings from two sequences of random number generators might become correlated more quickly.理想情况下,我只想设置一个种子,因为来自两个随机数生成器序列的绘图可能会更快地相关联。

However, I do not know what package will perform better (in terms of speed): numpy or random.但是,我不知道哪个包会表现得更好(在速度方面):numpy 或 random。 So I would like to find seeds for both packages that correspond to exactly the same Mersenne Twister sequence.因此,我想找到与完全相同的 Mersenne Twister 序列对应的两个包的种子。 In that way, the drawings for both models are the same and therefore also the number of iterations in each gradient descent step are the same, leading to a difference in speed only caused by the package I use.这样,两个模型的绘图是相同的,因此每个梯度下降步骤中的迭代次数也相同,导致速度差异仅由我使用的包引起。

I could not find any documentation on pairs of seeds that end up in the same random number sequence for both packages and also trying out all kind of combinations seems a bit cumbersome.我找不到任何关于种子对的文档,这些种子对两个包都以相同的随机数序列结束,而且尝试各种组合似乎有点麻烦。

I have tried the following:我尝试了以下方法:

np.random.seed(1)
numpy_1=np.random.randint(0,101)
numpy_2=np.random.randint(0,101)
numpy_3=np.random.randint(0,101)
numpy_4=np.random.randint(0,101)
for i in range(20000000):
    random.seed(i)
    random_1=random.randint(0,101)
    if random_1==numpy_1:
        random_2=random.randint(0,101)
        if random_2==numpy_2:
            random_3=random.randint(0,101)
            if random_3==numpy_3:
                random_4=random.randint(0,101)
                if random_4==numpy_4:
                    break
print(np.random.randint(0,101))
print(random.randint(0,101))

But this did not really work, as could be expected.但这并没有像预期的那样真正奏效。

numpy.random and python random work in different ways, although, as you say, they use the same algorithm. numpy.random和 python random以不同的方式工作,尽管正如你所说,它们使用相同的算法。

In terms of seed : You can use the set_state and get_state functions from numpy.random (in python random called getstate and setstate ) and pass the state from one to another.在种子方面:您可以使用set_stateget_state功能从numpy.random (Python中randomgetstatesetstate ),并从一个传递到另一个国家。 The structure is slightly different (in python the pos integer is attached to the last element in the state tuple).结构略有不同(在 python 中, pos整数附加到状态元组中的最后一个元素)。 See the docs for numpy.random.get_state() andrandom.getstate() :请参阅numpy.random.get_state()random.getstate()的文档:

import random
import numpy as np
random.seed(10)
s1 = list(np.random.get_state())
s2 = list(random.getstate())

s1[1] = np.array(s2[1][:-1]).astype('int32')
s1[2] = s2[1][-1]

np.random.set_state(tuple(s1))

print(np.random.random())
print(random.random())
>> 0.5714025946899135
0.5714025946899135

In terms of efficiency : it depends on what you want to do, but numpy is usually better because you can create arrays of elements without the need of a loop:在效率方面:这取决于你想做什么,但 numpy 通常更好,因为你可以在不需要循环的情况下创建元素数组:

%timeit np.random.random(10000)
142 µs ± 391 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%timeit [random.random() for i in range(10000)]
1.48 ms ± 2.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In terms of "randomness" , numpy is (according to their docs ), also better:在“随机性”方面,numpy 是(根据他们的文档),也更好:

Notes: The Python stdlib module "random" also contains a Mersenne Twister pseudo-random number generator with a number of methods that are similar to the ones available in RandomState .注意:Python stdlib 模块“random”还包含一个 Mersenne Twister 伪随机数生成器,其中包含许多与RandomState可用的方法类似的方法。 RandomState , besides being NumPy-aware, has the advantage that it provides a much larger number of probability distributions to choose from. RandomState除了支持 NumPy 之外,还有一个优势,它提供了更多的概率分布可供选择。

Consider the following dirty hack:考虑以下肮脏的黑客:

import random
import numpy as np

random.seed(42)
np.random.seed(42)

print(random.random(), np.random.random())

# copy numpy random module state to python random module
a = random.getstate()
b = np.random.get_state()
a2 = (a[0], tuple(int(val) for val in list(b[1]) + [a[1][-1]]), *a[2:])
random.setstate(a2)

print(random.random(), np.random.random())

Output:输出:

0.6394267984578837 0.3745401188473625  # different
0.9507143064099162 0.9507143064099162  # same

Not sure if this way really consistent across all the possibilities of both implementations.不确定这种方式在两种实现的所有可能性中是否真的一致。

Duplication of this post 此帖重复

Answer depends of the needs :答案取决于需求:
- Cryptography / security : secrets - 密码学/安全性:秘密
- Scientific Research : numpy - 科学研究:numpy
- Common Use : random - 常见用途:随机

Just a quick recap.只是快速回顾一下。 NumPy's np.random.randint(a, b) is different from random.randint(a, b) . NumPy 的np.random.randint(a, b)random.randint(a, b)

print(np.random.randint(0,101)) # return betweem [0, 101), 101 exclusive
print(random.randint(0,101)) # return between [0, 101], 101 inclusive

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM