简体繁体 English

为什么 rand() 的使用被认为是不好的？

[英]Why is the use of rand() considered bad?

原文 2018-10-18 07:38:10 2 7 c++/ random/ lcg

I heard some guys telling that the use of rand() is bad EVEN AFTER USING srand() to get a seed.我听到一些人说，即使在使用srand()获取种子之后，使用rand()很糟糕。 Why is that so?为什么呢？ I want to know how the stuff happens... And sorry for another question.. but what is an alternative to this then?我想知道这些事情是如何发生的......还有另一个问题很抱歉......但是有什么替代方法呢？

7 个解决方案

There are two parts to this story.这个故事有两个部分。

First, rand is a pseudorandom number generator .首先， rand是一个伪随机数生成器。 This means it depends on a seed.这意味着它取决于种子。 For a given seed it will always give the same sequence (assuming the same implementation).对于给定的种子，它总是给出相同的序列（假设实现相同）。 This makes it not suitable for certain applications where security is of a great concern.这使得它不适合某些安全性非常重要的应用程序。 But this is not specific to rand .但这并不是特定于rand 。 It's an issue with any pseudo-random generator.这是任何伪随机生成器的问题。 And there are most certainly a lot of classes of problems where a pseudo-random generator is acceptable.并且肯定有很多类别的问题可以接受伪随机生成器。 A true random generator has its own issues (efficiency, implementation, entropy) so for problems that are not security related most often a pseudo-random generator is used.真正的随机生成器有其自身的问题（效率、实现、熵），因此对于与安全无关的问题，最常使用伪随机生成器。

So you analyzed your problem and you conclude a pseudo-random generator is the solution.因此，您分析了您的问题并得出结论，伪随机生成器是解决方案。 And here we arrive to the real troubles with the C random library (which includes rand and srand ) that are specific to it and make it obsolete (aka: the reasons you should never use rand and the C random library).在这里，我们遇到了 C 随机库（包括rand和srand ）的真正问题，这些问题是特定于它的，并使其过时（也就是：永远不要使用rand和 C 随机库的原因）。

One issue is that it has a global state (set by srand ).一个问题是它有一个全局状态（由srand设置）。 This makes it impossible to use multiple random engines at the same time.这使得无法同时使用多个随机引擎。 It also greatly complicates multithreaded tasks.它还极大地使多线程任务复杂化。
The most visible problem of it is that it lacks a distribution engine : rand gives you a number in interval [0 RAND_MAX] .它最明显的问题是它缺少分发引擎： rand给你一个区间[0 RAND_MAX] 。 It is uniform in this interval, which means that each number in this interval has the same probability to appear.在这个区间是一致的，也就是说这个区间的每个数字出现的概率都是一样的。 But most often you need a random number in a specific interval.但大多数情况下，您需要一个特定时间间隔内的随机数。 Let's say [0, 1017] .假设[0, 1017] 。 A commonly (and naive) used formula is rand() % 1018 .一个常用的（和简单的）公式是rand() % 1018 。 But the issue with this is that unless RAND_MAX is an exact multiple of 1018 you won't get an uniform distribution.但问题在于，除非RAND_MAX是1018的精确倍数，否则您将无法获得均匀分布。
Another issue is the Quality of Implementation of rand .另一个问题是rand的实施质量。 There are other answers here detailing this better than I could, so please read them.这里还有其他答案比我更详细地说明了这一点，所以请阅读它们。

In modern C++ you should definitely use the C++ library from <random> which comes with multiple random well-defined engines and various distributions for integer and floating point types.在现代 C++ 中，你绝对应该使用来自<random>的 C++ 库，它带有多个随机定义良好的引擎以及整数和浮点类型的各种分布。

None of the answers here explains the real reason of being rand() bad .这里的答案都没有解释成为rand() bad的真正原因。

rand() is a pseudo-random number generator (PRNG) , but this doesn't mean it must be bad. rand()是一个伪随机数生成器 (PRNG) ，但这并不意味着它一定是坏的。 Actually, there are very good PRNGs, which are statistically hard or impossible to distinguish from true random numbers.实际上，有非常好的 PRNG，它们在统计上很难或不可能与真正的随机数区分开来。

rand() is completely implementation defined, but historically it is implemented as a Linear Congruential Generator (LCG) , which is usually a fast, but notoriously bad class of PRNGs. rand()是完全实现定义的，但从历史上看，它是作为线性同余生成器 (LCG) 实现的，这通常是一种快速但臭名昭著的 PRNG 类。 The lower bits of these generators have much lower statistical randomness than the higher bits and the generated numbers can produce visible lattice and/or planar structures (the best example of that is the famous RANDU PRNG).这些生成器的低位比高位具有低得多的统计随机性，并且生成的数字可以产生可见的晶格和/或平面结构（最好的例子是著名的RANDU PRNG）。 Some implementations try to reduce the lower bits problem by shifting the bits right by a pre-defined amount, however this kind of solution also reduces the range of the output.一些实现尝试通过将位右移预定量来减少低位问题，但是这种解决方案也减少了输出的范围。

Still, there are notable examples of excellent LCGs, like L'Ecuyer's 64 and 128 bits multiplicative linear congruential generators presented in Tables of Linear Congruential Generators of Different Sizes and Good Lattice Structure, Pierre L'Ecuyer, 1999 .尽管如此，仍有一些优秀 LCG 的显着例子，例如 L'Ecuyer 的 64 和 128 位乘法线性同余生成器，这些生成器在Tables of Linear Congruential Generators of different Sizes and Good Lattice Structure, Pierre L'Ecuyer, 1999 中介绍。

The general rule of thumb is that don't trust rand() , use your own pseudo-random number generator which fits your needs and usage requirements.一般的经验法则是不要相信rand() ，使用适合您的需求和使用要求的自己的伪随机数生成器。

What is bad about rand / srand is that rand — rand / srand坏处在于rand —

uses an unspecified algorithm for the sequence of numbers it generates, yet对其生成的数字序列使用未指定的算法，但
allows that algorithm to be initialized with srand for repeatable "randomness".允许使用srand初始化该算法以获得可重复的“随机性”。

These two points, taken together, hamper the ability of implementations to improve on rand 's implementation (eg, to use a cryptographic random number generator [RNG] or an otherwise "better" algorithm for producing pseudorandom numbers).这两点加在一起，妨碍了实现改进rand实现的能力（例如，使用加密随机数生成器 [RNG] 或其他“更好”的算法来生成伪随机数）。 For example, JavaScript's Math.random and FreeBSD's arc4random don't have this problem, since they don't allow applications to seed them for repeatable "randomness" — it's for exactly this reason that the V8 JavaScript engine was able to change its Math.random implementation to a variant of xorshift128+ while preserving backward compatibility.例如，JavaScript 的Math.random和 FreeBSD 的arc4random就没有这个问题，因为它们不允许应用程序为可重复的“随机性”播种——正是因为这个原因，V8 JavaScript 引擎才能够改变它的Math.random实现xorshift128+的变体，同时保持向后兼容性。 (On the other hand, letting applications supply additional data to supplement "randomness", as in BCryptGenRandom , is less problematic; even so, however, this is generally seen only in cryptographic RNGs.) （另一方面，让应用程序提供额外的数据来补充“随机性”，就像在BCryptGenRandom ，问题较小；然而，即使如此，这通常只在加密 RNG 中才能看到。）

Also:还：

The fact that the algorithm and the seeding procedure for rand and srand are unspecified means that even reproducible "randomness" is not guaranteed between rand / srand implementations, between versions of the same standard library , between operating systems, etc. rand和srand的算法和种子程序未指定这一事实意味着，在rand / srand实现之间、同一标准库的版本之间、操作系统之间等之间，甚至不能保证可重复的“随机性”。
If srand is not called before rand is, rand behaves similarly as though srand(1) were first called.如果在rand之前未调用srand ，则rand行为与首先调用srand(1)行为类似。 In practice, this means that rand can only be implemented as a pseudorandom number generator (PRNG) rather than as a nondeterministic RNG, and that rand 's PRNG algorithm can't differ in a given implementation whether the application calls srand or not.实际上，这意味着rand只能作为伪随机数生成器 (PRNG) 来实现，而不是作为非确定性 RNG，并且无论应用程序是否调用srand ， rand的 PRNG 算法在给定的实现中都不会有所不同。

EDIT (Jul. 8, 2020):编辑（2020 年 7 月 8 日）：

There is one more important thing that's bad about rand and srand .关于rand和srand ，还有一件更重要的事情是不好的。 Nothing in the C standard for these functions specifies a particular distribution that the "pseudo-random numbers" delivered by rand have to follow, including the uniform distribution or even a distribution that approximates the uniform distribution.这些函数的 C 标准中没有规定rand提供的“伪随机数”必须遵循的特定分布，包括均匀分布甚至近似均匀分布的分布。 Contrast this with C++'s uniform_int_distribution and uniform_real_distribution classes, as well as the specific pseudorandom generator algorithms specified by C++, such as linear_congruential_engine and mt19937 .将此与 C++ 的uniform_int_distribution和uniform_real_distribution类以及 C++ 指定的特定伪随机生成器算法（例如linear_congruential_engine和mt19937 。

EDIT (Dec. 12, 2020):编辑（2020 年 12 月 12 日）：

Yet another bad thing about rand and srand : srand takes a seed that can only be as big as an unsigned int .还有一个关于rand和srand坏事： srand需要一个只能与unsigned int一样大的种子。 In most mainstream C implementations today, unsigned int is 32 bits long, meaning that only 2^32 different sequences of numbers can be selected this way even if the underlying algorithm implemented by rand can produce many more different sequences than that (say, 2^128 or even 2^19937 as in C++'s mt19937 ).在当今大多数主流 C 实现中， unsigned int长度为 32 位，这意味着即使rand实现的底层算法可以产生比这更多的不同序列（例如，2^32），也只能以这种方式选择 2^32 个不同的数字序列。 128 甚至 2^19937 就像在 C++ 的mt19937 ）。

Firstly, srand() doesn't get a seed, it sets a seed.首先， srand()不会获得种子，而是设置种子。 Seeding is part of the use of any pseudo random number generator (PRNG).播种是使用任何伪随机数生成器 (PRNG) 的一部分。 When seeded the sequence of numbers that the PRNG produces from that seed is strictly deterministic because (most?) computers have no means to generate true random numbers.当播种时，PRNG 从该种子产生的数字序列是严格确定的，因为（大多数？）计算机无法生成真正的随机数。 Changing your PRNG won't stop the sequence from being repeatable from the seed and, indeed, this is a good thing because the ability to produce the same sequence of pseudo-random numbers is often useful.更改您的 PRNG 不会阻止序列从种子中重复，事实上，这是一件好事，因为生成相同伪随机数序列的能力通常很有用。

So if all PRNGs share this feature with rand() why is rand() considered bad?因此，如果所有 PRNG 都与rand()共享此功能，为什么rand()被认为是不好的？ Well, it comes down to the "psuedo" part of pseudo-random.好吧，它归结为伪随机的“伪”部分。 We know that a PRNG can't be truly random but we want it to behave as close to a true random number generator as possible, and there are various tests that can be applied to check how similar a PRNG sequence is to a true random sequence.我们知道 PRNG 不可能是真正随机的，但我们希望它的行为尽可能接近真正的随机数生成器，并且可以应用各种测试来检查 PRNG 序列与真正随机序列的相似程度. Although its implementation is unspecified by the standard, rand() in every commonly used compiler uses a very old method of generation suited for very weak hardware, and the results it produces fair poorly on these tests.尽管标准未指定其实现，但每个常用编译器中的rand()都使用一种非常古老的生成方法，适用于非常弱的硬件，并且在这些测试中产生的结果相当糟糕。 Since this time many better random number generators have been created and it is best to choose one suited to your needs rather than relying on the poor quality one likely to provided by rand() .由于这一次已经创建了许多更好的随机数生成器，最好选择一种适合您的需求，而不是依赖rand()可能提供的低质量的生成器。

Which is suitable for your purposes depends on what you are doing, for example you may need cryptographic quality, or multi-dimensional generation, but for many uses where you simply want things to be fairly uniformly random, fast generation, and money is not on the line based on the quality of the results you likely want the xoroshiro128+ generator.哪种适合您的目的取决于您在做什么，例如您可能需要加密质量或多维生成，但对于许多用途，您只希望事情相当均匀随机，快速生成，并且不花钱该行基于您可能想要xoroshiro128+生成器的结果质量。 Alternatively you could use one of the methods in C++'s <random> header but the generators offered are not state of the art, and much better is now available, however, they're still good enough for most purposes and quite convenient.或者，您可以使用 C++ 的<random>标头中的一种方法，但提供的生成器不是最先进的，现在可用的更好，但是，它们对于大多数用途仍然足够好并且非常方便。

If money is on the line (eg for card shuffling in an online casino, etc.), or you need cryptogaphic quality, you need to carefully investigation appropriate generators and ensure they exactly much your specific needs.如果钱在路上（例如在网上赌场洗牌等），或者您需要加密质量，您需要仔细调查合适的生成器并确保它们完全符合您的特定需求。

rand is usually -but not always-, for historical reasons, a very bad pseudo-random number generator (PRNG).由于历史原因， rand通常 - 但并非总是 - 一个非常糟糕的伪随机数生成器(PRNG)。 How bad is it is implementation specific.它有多糟糕是特定于实现的。

C++11 has nice, much better, PRNGs. C++11 有很好的、更好的 PRNG。 Use its <random> standard header .使用它的<random>标准标头。 See notably std::uniform_int_distribution here which has a nice example above std::mersenne_twister_engine .特别参见std::uniform_int_distribution here ，它在std::mersenne_twister_engine上面有一个很好的例子。

PRNGs are a very tricky subject. PRNG 是一个非常棘手的主题。 I know nothing about them, but I trust the experts.我对他们一无所知，但我相信专家。

Let me add another reason that makes rand() totally not usable: The standard does not define any characteristic of random numbers it generates, neither distribution nor range.让我补充另一个使 rand() 完全不可用的原因：该标准没有定义它生成的随机数的任何特征，既没有分布也没有范围。

Without definition of distribution we can't even wrap it to have what distribution we want.没有分布的定义，我们甚至无法将它包装成我们想要的分布。

Even further, theorically I can implement rand() by simply return 0, and anounce that RAND_MAX of my rand() is 0.更进一步，理论上我可以通过简单地返回 0 来实现 rand()，并宣布我的 rand() 的RAND_MAX为 0。

Or even worse, I can let least significant bit always be 0, which doesn't violate the standard.或者更糟糕的是，我可以让最低有效位始终为 0，这并不违反标准。 Image someone write code like if (rand()%2) ... . if (rand()%2) ...一下某人编写的代码，例如if (rand()%2) ... 。

Pratically, rand() is implementation defined and the standards says:实际上， rand() 是实现定义的，标准说：

There are no guarantees as to the quality of the random sequence produced and some implementations are known to produce sequences with distressingly non-random low-order bits.无法保证所产生的随机序列的质量，并且已知某些实现会产生具有令人不安的非随机低阶位的序列。 Applications with particular requirements should use a generator that is known to be sufficient for their needs具有特殊要求的应用程序应使用已知足以满足其需求的发电机

http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf p36 http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf p36

If you use rand(), you will basically have the same result after generating your random number.如果您使用 rand()，在生成随机数后，您将获得基本相同的结果。 So even after using srand(), it will be easy to predict the number generated if someone can guess the seed you use.因此，即使在使用 srand() 之后，如果有人能猜出您使用的种子，也很容易预测生成的数字。 This is because the function rand() uses a specific algorithm to produce such numbers这是因为函数 rand() 使用特定的算法来生成这样的数字

With some time to waste, you can figure out how to predict numbers generated by the function, given the seed.有一些时间可以浪费，您可以弄清楚如何在给定种子的情况下预测函数生成的数字。 All you need now is to guess the seed.您现在需要的只是猜测种子。 Some people refer to the seed as the current time.有些人将种子称为当前时间。 So if can guess the time at which you run the application, I ll be able to predict the number所以如果能猜出你运行应用程序的时间，我就能预测出这个数字

IT IS BAD TO USE RAND()!!!!使用 RAND() 是不好的！！！！