简体   繁体   English

boost::random 从不同状态的相同种子中过于频繁地生成相同的值

[英]boost::random generates identical values too often from the same seed at different states

Problem description问题描述

Sometimes I get the same random number from a uniform distribution using a Mersenne Twister engine even I properly used the engine and iterated it.有时我使用 Mersenne Twister 引擎从均匀分布中得到相同的随机数,即使我正确使用了引擎并对其进行了迭代。 I know that the number of possible states of the engine is finite and number of possible generated values is also finite, but this is not the case now.我知道引擎的可能状态的数量是有限的,并且可能生成的值的数量也是有限的,但现在情况并非如此。

Using boost's implementation, 1e6 number of uniformly distributed random values are generated on the range [0;使用 boost 的实现,在 [0; 范围内生成了 1e6 个均匀分布的随机值。 1e7). 1e7)。 That means that there way more possible values than required number of random values.这意味着可能的值比所需的随机值数量多。 However, I get quite often the same values, sometimes more than 100 times in this range.但是,我经常得到相同的值,有时在这个范围内超过 100 次。 How is it possible?这怎么可能?

Code代码

A simple code is provided to reproduce the situation.提供了一个简单的代码来重现这种情况。 On both platforms I get the same problem:在两个平台上,我都遇到了同样的问题:

  • MSVS 2019 with boost-random:x64-windows 1.71.0, and带有 boost-random:x64-windows 1.71.0 的 MSVS 2019,以及
  • g++ (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609 with libboost-dev 1.58.0.1ubuntu1 g++ (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609 with libboost-dev 1.58.0.1ubuntu1
#include <iostream>
#include <chrono>

#include <boost/random/mersenne_twister.hpp>          // random number generator
#include <boost/random/uniform_real_distribution.hpp> // uniform distribution generator
using namespace std;

int main()
{
    size_t seed = static_cast<int> (std::chrono::system_clock::now().time_since_epoch().count());
    cout << "seed = " << seed << endl;
    
    boost::random::mt19937 engine(seed);                         // the random number generator engine
    boost::random::uniform_real_distribution<double> u(0, 1e7);  // uniformly distributed double values on the range [0; 1e7)
    cout.precision(20);
    vector<double> history;                                      // stores the generated values for comparison
    for (size_t i = 0; i < 1e6; ++i)
    {
        history.push_back(u(engine));
        for (size_t j = 0; j < i; ++j)
            if (history[i] == history[j])
                cout << "Equal values ("<< history[i] <<") at ID = " << i << " and " << j << endl;
    }
}

Question

Is there a bug in the code that produces the same values?代码中是否存在产生相同值的错误? Or is it a bug in boost?或者它是boost中的一个错误?

For my task it is important to generate numbers with uniform distribution.对于我的任务,生成均匀分布的数字很重要。 Finding identical values is one of the easiest tests but there are many more and I am sure I don't want to do quality analysis on a well-known library like Boost.找到相同的值是最简单的测试之一,但还有更多测试,我确信我不想对像 Boost 这样的知名库进行质量分析。 I didn't want to use the standard library, because it is not guaranteed that two different compilers will give the same sequence for the same seed values, but it was a requirement for the task.我不想使用标准库,因为不能保证两个不同的编译器会为相同的种子值提供相同的序列,但这是任务的要求。 What kind of a solution can you suggest?你能提出什么样的解决方案?

Note笔记

A strange behavior can be seen if one compares the generated values with the one std::random generates .如果将生成的值与std::random generates进行比较,可以看到一种奇怪的行为。 Example for values from random::boost for seed 4561565448989 is种子 4561565448989 的来自random::boost值的示例是

1755586.0406719148159
3354420.976247638464   <--
3630764.0071026980877
3488445.2889673411846  <--
7920481.4555123448372
8773544.1024415194988  <--

while standard library generates而标准库生成

3354420.9766563926823  <--
3488445.2898126943037  <--
8773544.1042856499553  <--
...

That is, every second generated value in the boost's sequence is very close to a corresponding value in the standard library's implementation.也就是说,boost 序列中每一秒生成的值都非常接近标准库实现中的相应值。 When two values in the boost-sequence are equal, the values in the standard-library-sequence are not equal, but close to each other.当 boost-sequence 中的两个值相等时,standard-library-sequence 中的值不相等,而是彼此接近。 The similarity holds for MSVS and g++ compilers too, which have the right to have different implementation for Mersenne Twister and distributions. MSVS 和 g++ 编译器也有相似之处,它们有权对 Mersenne Twister 和发行版进行不同的实现。


Update更新

Poor seed?可怜的种子?

It was suggested that maybe it is the poor seed value that causes this phenomenon because with a size_t only 2^64 number of different initial states can be generated.有人提出,可能是种子值不佳导致了这种现象,因为使用size_t只能生成2^64个不同的初始状态。 Even worse, our life is short and the possible time values are even less.更糟糕的是,我们的生命是短暂的,可能的时间价值甚至更少。 Although this is true, it doesn't explain why the same numbers are generated many times from different states.虽然这是真的,但它并不能解释为什么从不同的状态多次生成相同的数字。 After all, the engine is initiated only once so I chose one state from a 64bit-subset, that is a subset of the all possible states.毕竟,引擎只启动一次,所以我从 64 位子集中选择了一个状态,即所有可能状态的子集。

Poor seed could be a reason if I initiated the engine multiple times and if I found identical values in the sequences of the differently (but not differently enough) initiated engines.如果我多次启动引擎,并且我在不同(但不够不同)启动的引擎的序列中发现相同的值,那么糟糕的种子可能是一个原因。

It is the distribution generator它是分布发生器

If the standard MT engine is used, but boost's distribution, the problem persists.如果使用标准的MT引擎,但是boost的分布,问题依然存在。 But if the engine is the one from boost and the distribution is standard, the problem disappears.但是如果发动机是增压发动机并且分配是标准的,那么问题就消失了。 The problem is, as Peter pointed out , that the uniform distribution is platform depend for which I use boost.问题是,正如Peter 指出的那样,统一分布取决于我使用 boost 的平台。

Some statistics一些统计数据

I made a little analysis on the distributions.我对分布做了一些分析。 Using the same boost::random::mt19937 engine , but either boost's or std's uniform_real_distribution<double> u(0, 1) , I generated value pairs and investigated their difference and plotted their correlation integral I ( x ), ie the probability that two values are closer than x .使用相同的boost::random::mt19937 engine ,但无论是 boost 还是 std 的uniform_real_distribution<double> u(0, 1) ,我生成了值对并调查了它们的差异并绘制了它们的相关积分I ( x ),即概率两个值比x更接近。 As U [0;由于U [0; 1) is a 1D domain, I ( x ) starts as a linear function for small x values (and tends to 1). 1) 是一维域, I ( x ) 开始为小x值的线性函数(并趋于 1)。 The results are shown in the figure below.结果如下图所示。 显示 std 和 boost 的相关积分以及预期值的图 The figure tells that the distributions from the boost implementation not only have a bias but there are only 4 possible distance values, whereas it is known that double s are more dense, and std indeed produces a larger spectrum of the distance values.该图表明,来自 boost 实现的分布不仅有偏差,而且只有 4 个可能的距离值,而众所周知double更密集,并且 std 确实产生了更大的距离值谱。

bug or not a bug?错误还是不是错误? a deleted answer已删除的答案

An already deleted answer suggested to improve the seed values, but so far it turned out it wasn't the source of the issue.已删除的答案建议提高种子值,但到目前为止,事实证明这不是问题的根源。 Since then I posted this issue on boost's github too and it still not clear where the problem lies.从那以后我也在boost的github上发布了这个问题,但仍然不清楚问题出在哪里。 It can be a bug in boost, but even in that case this SO source can help others to identify issues in their distribution generators.这可能是 boost 中的一个错误,但即使在这种情况下,这个 SO 源也可以帮助其他人识别他们的分发生成器中的问题。

This is not a bug in Boost.这不是 Boost 中的错误。 The problem is due to the limited resolution provided by the older, 32-bit MersenneTwister.问题是由于较旧的 32 位 MersenneTwister 提供的分辨率有限。 The steps you are seeing on the cumulative distribution are equal to $1/2^{32} \\approx 10^{-10}$.您在累积分布上看到的步骤等于 $1/2^{32} \\approx 10^{-10}$。 I was made aware of a costly, real-world failure of a simulation which resulted from this a couple of years ago.几年前,我意识到仿真在现实世界中的失败代价高昂。 The solution is to use an RNG which is capable of producing full-precision doubles, while passing all statistical test suites, such as MersenneTwister64 or MIXMAX.解决方案是使用能够生成全精度双精度数的 RNG,同时通过所有统计测试套件,例如 MersenneTwister64 或 MIXMAX。

There is no bug in Boost. Boost中没有错误。

The random engine mt19937 (both in Boost and in the C++ standard library) has a state of 19968 bits, so that there are 2^19968 different variations on that state. 随机引擎mt19937 (在Boost和C ++标准库中)均具有19968位的状态,因此该状态有2 ^ 19968个不同的变化。 Giving that engine a seed of only 32 or 64 bits (depending on the size of int ) will produce a severely limited selection of these variations. 给该引擎仅32或64位的种子(取决于int的大小)将对这些变化产生严格的限制。

Moreover, seeding the engine with timestamps (or with sequential numbers or linearly related numbers) is generally not appropriate for Mersenne Twister or other PRNGs, as doing so can cause correlated random number sequences (especially in view of the fact that Mersenne Twister is a linear PRNG). 此外,为时间戳(或序列号或线性相关的数字)播种引擎通常不适合梅森·Twister或其他PRNG,因为这样做会导致相关的随机数序列(特别是鉴于梅森·Twister是线性的) PRNG)。

See also: 也可以看看:

I didn't want to use the standard library, because it is not guaranteed that two different compilers will give the same sequence for the same seed values, but it was a requirement for the task. 我不想使用标准库,因为不能保证两个不同的编译器将为相同的种子值提供相同的序列,但这是任务的要求。

In the C++ standard library, random engines (such as std::mt19937 ) are guaranteed to have the same output for the same input on all implementations of the C++ standard library. 在C ++标准库中,对于C ++标准库的所有实现,对于相同的输入,随机引擎(例如std::mt19937 )均保证具有相同的输出。 This is in contrast to random distributions (such as std::uniform_int_distribution , which use algorithms that may vary between implementations of the C++ standard library. 这与随机分布 (例如std::uniform_int_distribution ,后者使用的算法在C ++标准库的实现之间可能会有所不同。

See also Random number generator performance varies between platforms and Random output different between implementations . 另请参见平台之间的随机数生成器性能 不同,实现之间的随机输出也不同

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 即使更改种子,std :: default_random_engine也会生成相同的值? - std::default_random_engine generates the same values even with changing seed? Boost随机发生器返回相同的值 - Boost Random Generators returning identical values 设置种子提升::随机 - Setting seed boost::random openMP使用相同的种子生成不同的随机数 - openMP generate different random numbers with the same seed 一种在每个不同程序运行中更改boost :: random种子的方法 - A way change the seed of boost::random in every different program run 确定性随机数发生器为同一种子提供不同的随机数 - determinisitic random number generator giving different random number for same seed 使用Boost.Random从种子生成多精度整数 - Using Boost.Random to generate multiprecision integers from a seed 在具有相同种子的不同OS上实现相同的随机数序列 - Achieve same random number sequence on different OS with same seed C ++中PRNG的默认随机引擎会为类的每个实例生成相同的输出-适当的种子? - Default random engine for PRNG in C++ generates same output for every instance of a class - proper seed? C ++ 11标准是否保证跨实现的相同种子的相同随机数? - Does the C++11 standard guarantee identical random numbers for the same seed across implementations?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM