简体   繁体   English

我可以在 numpy.random.seed() 中输入哪些数字?

[英]What numbers that I can put in numpy.random.seed()?

I have noticed that you can put various numbers inside of numpy.random.seed() , for example numpy.random.seed(1) , numpy.random.seed(101) .我注意到您可以在numpy.random.seed()中放入各种数字,例如numpy.random.seed(1)numpy.random.seed(101) What do the different numbers mean?不同的数字是什么意思? How do you choose the numbers?你如何选择数字?

Consider a very basic random number generator:考虑一个非常基本的随机数生成器:

Z[i] = (a*Z[i-1] + c) % m

Here, Z[i] is the ith random number, a is the multiplier and c is the increment - for different a , c and m combinations you have different generators.这里, Z[i]是第ith随机数, a是乘数, c是增量 - 对于不同的acm组合,您有不同的生成器。 This is known as the linear congruential generator introduced by Lehmer.这被称为 Lehmer 引入的线性同余生成器 The remainder of that division, or modulus ( % ), will generate a number between zero and m-1 and by setting U[i] = Z[i] / m you get random numbers between zero and one.该除法的余数或模数 ( % ) 将生成一个介于 0 和m-1之间的数字,并通过设置U[i] = Z[i] / m获得介于 0 和 1 之间的随机数。

As you may have noticed, in order to start this generative process - in order to have a Z[1] you need to have a Z[0] - an initial value.您可能已经注意到,为了开始这个生成过程——为了有一个Z[1]你需要有一个Z[0] ——一个初始值。 This initial value that starts the process is called the seed.这个启动过程的初始值称为种子。 Take a look at this example:看看这个例子:

在此处输入图片说明

The initial value, the seed is determined as 7 to start the process.初始值,种子确定为7开始进程。 However, that value is not used to generate a random number.但是,该值不用于生成随机数。 Instead, it is used to generate the first Z .相反,它用于生成第一个Z

The most important feature of a pseudo-random number generator would be its unpredictability.伪随机数生成器最重要的特性是它的不可预测性。 Generally, as long as you don't share your seed, you are fine with all seeds as the generators today are much more complex than this.一般来说,只要你不分享你的种子,你就可以处理所有的种子,因为今天的生成器比这复杂得多。 However, as a further step you can generate the seed randomly as well.但是,作为进一步的步骤,您也可以随机生成种子。 You can skip the first n numbers as another alternative.您可以跳过前n数字作为另一种选择。

Main source: Law, AM (2007).主要来源:Law, AM (2007)。 Simulation modeling and analysis.仿真建模和分析。 Tata McGraw-Hill.塔塔麦格劳-希尔。

The short answer:简短的回答:

There are three ways to seed() a random number generator in numpy.random :numpy.random可以通过三种方式来seed()随机数生成器:

  • use no argument or use None -- the RNG initializes itself from the OS's random number generator (which generally is cryptographically random)不使用参数或使用None —— RNG 从操作系统的随机数生成器(通常是加密随机的)初始化自己

  • use some 32-bit integer N -- the RNG will use this to initialize its state based on a deterministic function (same seed → same state)使用一些 32 位整数 N——RNG 将使用它来根据确定性函数初始化其状态(相同种子→相同状态)

  • use an array-like sequence of 32-bit integers n 0 , n 1 , n 2 , etc. -- again, the RNG will use this to initialize its state based on a deterministic function (same values for seed → same state).使用类似数组的 32 位整数 n 0 、 n 1 、 n 2等序列 - 同样,RNG 将使用它来基于确定性函数初始化其状态(相同值的种子 → 相同状态)。 This is intended to be done with a hash function of sorts, although there are magic numbers in the source code and it's not clear why they are doing what they're doing.尽管源代码中存在幻数并且不清楚为什么他们正在做他们正在做的事情,但这是打算通过各种散列函数来完成的。

If you want to do something repeatable and simple, use a single integer.如果您想做一些可重复且简单的事情,请使用单个整数。

If you want to do something repeatable but unlikely for a third party to guess, use a tuple or a list or a numpy array containing some sequence of 32-bit integers.如果您想做一些可重复但第三方不太可能猜到的事情,请使用包含某些 32 位整数序列的元组或列表或 numpy 数组。 You could, for example, use numpy.random with a seed of None to generate a bunch of 32-bit integers (say, 32 of them, which would generate a total of 1024 bits) from the OS's RNG, store in some seed S which you save in some secret place, then use that seed to generate whatever sequence R of pseudorandom numbers you wish.例如,您可以使用带有None种子的numpy.random从操作系统的 RNG 生成一堆 32 位整数(例如,其中 32 个,将生成总共 1024 位),存储在一些种子S您将其保存在某个秘密位置,然后使用该种子生成您希望的任何伪随机数序列 R。 Then you can later recreate that sequence by re-seeding with S again, and as long as you keep the value of S secret (as well as the generated numbers R), no one would be able to reproduce that sequence R. If you just use a single integer, there's only 4 billion possibilities and someone could potentially try them all.然后您可以稍后通过再次使用S重新播种来重新创建该序列,只要您保密S的值(以及生成的数字 R),没有人能够重现该序列 R。如果您只是使用一个整数,只有 40 亿种可能性,有人可能会全部尝试。 That may be a bit on the paranoid side, but you could do it.这可能有点偏执,但你可以做到。


Longer answer更长的答案

The numpy.random module uses the Mersenne Twister algorithm, which you can confirm yourself in one of two ways: numpy.random模块使用Mersenne Twister算法,您可以通过以下两种方式之一进行确认:

In any case here's what the numpy.random.RandomState documentation says about seed() :无论如何,这是numpy.random.RandomState文档对seed()

Compatibility Guarantee A fixed seed and a fixed series of calls to RandomState methods using the same parameters will always produce the same results up to roundoff error except when the values were incorrect.兼容性保证使用相同参数的固定种子和对RandomState方法的固定系列调用将始终产生相同的结果,直到舍入误差,除非值不正确。 Incorrect values will be fixed and the NumPy version in which the fix was made will be noted in the relevant docstring.将修复不正确的值,并在相关文档字符串中注明进行修复的 NumPy 版本。 Extension of existing parameter ranges and the addition of new parameters is allowed as long the previous behavior remains unchanged.只要先前的行为保持不变,就允许扩展现有参数范围和添加新参数。

Parameters:参数:
seed : {None, int, array_like}, optional种子:{无,int,array_like},可选

Random seed used to initialize the pseudo-random number generator.用于初始化伪随机数生成器的随机种子。 Can be any integer between 0 and 2**32 - 1 inclusive, an array (or other sequence) of such integers, or None (the default).可以是 0 到 2**32 - 1 之间的任何整数、此类整数的数组(或其他序列)或None (默认值)。 If seed is None , then RandomState will try to read data from /dev/urandom (or the Windows analogue) if available or seed from the clock otherwise.如果 seed 是None ,则 RandomState 将尝试从/dev/urandom (或 Windows 类似物)读取数据(如果可用)或从时钟中读取数据。

It doesn't say how the seed is used, but if you dig into the source code it refers to the init_by_array function: (docstring elided)它没有说明如何使用种子,但是如果您深入研究源代码,它会引用init_by_array函数:(文档字符串已省略)

def seed(self, seed=None):
    cdef rk_error errcode
    cdef ndarray obj "arrayObject_obj"
    try:
        if seed is None:
            with self.lock:
                errcode = rk_randomseed(self.internal_state)
        else:
            idx = operator.index(seed)
            if idx > int(2**32 - 1) or idx < 0:
                raise ValueError("Seed must be between 0 and 2**32 - 1")
            with self.lock:
                rk_seed(idx, self.internal_state)
    except TypeError:
        obj = np.asarray(seed).astype(np.int64, casting='safe')
        if ((obj > int(2**32 - 1)) | (obj < 0)).any():
            raise ValueError("Seed must be between 0 and 2**32 - 1")
        obj = obj.astype('L', casting='unsafe')
        with self.lock:
            init_by_array(self.internal_state, <unsigned long *>PyArray_DATA(obj),
                PyArray_DIM(obj, 0))

And here's what the init_by_array function looks like:这是init_by_array函数的样子:

extern void
init_by_array(rk_state *self, unsigned long init_key[], npy_intp key_length)
{
    /* was signed in the original code. RDH 12/16/2002 */
    npy_intp i = 1;
    npy_intp j = 0;
    unsigned long *mt = self->key;
    npy_intp k;

    init_genrand(self, 19650218UL);
    k = (RK_STATE_LEN > key_length ? RK_STATE_LEN : key_length);
    for (; k; k--) {
        /* non linear */
        mt[i] = (mt[i] ^ ((mt[i - 1] ^ (mt[i - 1] >> 30)) * 1664525UL))
            + init_key[j] + j;
        /* for > 32 bit machines */
        mt[i] &= 0xffffffffUL;
        i++;
        j++;
        if (i >= RK_STATE_LEN) {
            mt[0] = mt[RK_STATE_LEN - 1];
            i = 1;
        }
        if (j >= key_length) {
            j = 0;
        }
    }
    for (k = RK_STATE_LEN - 1; k; k--) {
        mt[i] = (mt[i] ^ ((mt[i-1] ^ (mt[i-1] >> 30)) * 1566083941UL))
             - i; /* non linear */
        mt[i] &= 0xffffffffUL; /* for WORDSIZE > 32 machines */
        i++;
        if (i >= RK_STATE_LEN) {
            mt[0] = mt[RK_STATE_LEN - 1];
            i = 1;
        }
    }

    mt[0] = 0x80000000UL; /* MSB is 1; assuring non-zero initial array */
    self->gauss = 0;
    self->has_gauss = 0;
    self->has_binomial = 0;
}

This essentially "munges" the random number state in a nonlinear, hash-like method using each value within the provided sequence of seed values.这本质上是使用提供的种子值序列中的每个值,以非线性、类似散列的方法“调整”随机数状态。

What is normally called a random number sequence in reality is a "pseudo-random" number sequence because the values are computed using a deterministic algorithm and probability plays no real role.在现实中通常称为随机数序列的是“伪随机”数序列,因为这些值是使用确定性算法计算的,而概率没有实际作用。

The "seed" is a starting point for the sequence and the guarantee is that if you start from the same seed you will get the same sequence of numbers. “种子”是序列的起点,保证如果您从相同的种子开始,您将获得相同的数字序列。 This is very useful for example for debugging (when you are looking for an error in a program you need to be able to reproduce the problem and study it, a non-deterministic program would be much harder to debug because every run would be different).这对于调试非常有用(当您在程序中寻找错误时,您需要能够重现问题并研究它,非确定性程序将更难调试,因为每次运行都会不同) .

Basically the number guarantees the same 'randomness' every time.基本上,该数字每次都保证相同的“随机性”。

More properly, the number is a seed, which can be an integer, an array (or other sequence) of integers of any length, or the default (none).更准确地说,数字是一个种子,它可以是一个整数、一个任意长度的整数数组(或其他序列)或默认值(无)。 If seed is none, then random will try to read data from /dev/urandom if available or make a seed from the clock otherwise.如果种子为无,则 random 将尝试从 /dev/urandom 读取数据(如果可用)或从时钟生成种子。

Edit: In most honesty, as long as your program isn't something that needs to be super secure, it shouldn't matter what you pick.编辑:老实说,只要您的程序不需要超级安全,您选择什么都无所谓。 If this is the case, don't use these methods - use os.urandom() or SystemRandom if you require a cryptographically secure pseudo-random number generator.使用-如果是这种情况下,不要使用这些方法os.urandom()SystemRandom如果你需要加密安全伪随机数发生器。

The most important concept to understand here is that of pseudo-randomness.这里要理解的最重要的概念是伪随机性。 Once you understand this idea, you can determine if your program really needs a seed etc. I'd recommend reading here .一旦你理解了这个想法,你就可以确定你的程序是否真的需要种子等。我建议阅读这里

To understand the meaning of random seeds, you need to first understand the "pseudo-random" number sequence because the values are computed using a deterministic algorithm.要了解随机种子的含义,您首先需要了解“伪随机”数字序列,因为这些值是使用确定性算法计算的。

So you can think of this number as a starting value to calulate the next number you get from the random generator.因此,您可以将此数字视为计算从随机生成器获得的下一个数字的起始值。 Putting the same value here will make your program getting the same "random" value everytime, so your program becomes deterministic.在这里放置相同的值将使您的程序每次都获得相同的“随机”值,因此您的程序变得具有确定性。

As said in this post正如在这篇文章中所说

they ( numpy.random and random.random ) both use the Mersenne twister sequence to generate their random numbers, and they're both completely deterministic - that is, if you know a few key bits of information, it's possible to predict with absolute certainty what number will come next.它们( numpy.randomrandom.random )都使用梅森扭曲器序列来生成它们的随机数,并且它们都是完全确定的——也就是说,如果您知道一些关键信息,就可以绝对确定地进行预测接下来会是什么数字。

If you really care about randomness, ask the user to generate some noise (some arbitary words) or just put the system time as seed.如果你真的关心随机性,请让用户产生一些噪音(一些任意的词)或者只是把系统时间作为种子。

If your codes run on Intel CPU (or AMD with newest chips) I also suggest you to check the RdRand package which uses the cpu instruction rdrand to collect "true" (hardware) randomness.如果您的代码在 Intel CPU(或带有最新芯片的 AMD)上运行,我还建议您检查RdRand包,它使用 cpu 指令rdrand来收集“真实”(硬件)随机性。

Refs:参考:

  1. Random seed随机种子
  2. What is a seed in terms of generating a random number就生成随机数而言,什么是种子

一个非常具体的答案: np.random.seed可以采用0 and 2**32 - 1 ,有趣的是,这与可以采用任何可散列对象的random.seed不同。

A side comment: better set your seed to a rather large number but still within the generator limit.附带评论:最好将种子设置为相当大的数字,但仍在生成器限制内。 Doing so can let the seed number have a good balance of 0 and 1 bits.这样做可以让种子号在 0 和 1 位上有一个很好的平衡。 Avoid having many 0 bits in the seed.避免在种子中有很多 0 位。

Reference: pyTorch documentation参考:pyTorch 文档

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM