简体   繁体   English

std :: mt19937需要预热吗?

[英]Does std::mt19937 require warmup?

I've read that many pseudo-random number generators require many samples in ordered to be "warmed up". 我已经读过许多伪随机数生成器需要许多样本才能“预热”。 Is that the case when using std::random_device to seed std::mt19937, or can we expect that it's ready after construction? 这是使用std :: random_device种子std :: mt19937的情况,还是我们可以期待它在构建后准备就绪? The code in question: 有问题的代码:

#include <random>
std::random_device rd;
std::mt19937 gen(rd());

Mersenne Twister is a shift-register based pRNG (pseudo-random number generator) and is therefore subject to bad seeds with long runs of 0s or 1s that lead to relatively predictable results until the internal state is mixed up enough. Mersenne Twister是基于移位寄存器的pRNG(伪随机数发生器),因此受到长期0或1的坏种子的影响,导致相对可预测的结果,直到内部状态充分混合。

However the constructor which takes a single value uses a complicated function on that seed value which is designed to minimize the likelihood of producing such 'bad' states. 然而,采用单个值的构造函数在该种子值上使用复杂的函数,该函数旨在最小化产生这种“坏”状态的可能性。 There's a second way to initialize mt19937 where you directly set the internal state, via an object conforming to the SeedSequence concept. 还有第二种方法来初始化mt19937 ,您可以通过符合SeedSequence概念的对象直接设置内部状态。 It's this second method of initialization where you may need to be concerned about choosing a 'good' state or doing warmup. 这是第二种初始化方法,您可能需要关注选择“良好”状态或进行预热。


The standard includes an object conforming to the SeedSequence concept, called seed_seq . 该标准包括一个符合SeedSequence概念的对象,称为seed_seq seed_seq takes an arbitrary number of input seed values, and then performs certain operations on these values in order to produce a sequence of different values suitable for directly setting the internal state of a pRNG. seed_seq接受任意数量的输入种子值,然后对这些值执行某些操作,以便产生适合于直接设置pRNG的内部状态的不同值的序列。

Here's an example of loading up a seed sequence with enough random data to fill the entire std::mt19937 state: 这是一个加载具有足够随机数据的种子序列以填充整个std::mt19937状态的std::mt19937

std::array<int, 624> seed_data;
std::random_device r;
std::generate_n(seed_data.data(), seed_data.size(), std::ref(r));
std::seed_seq seq(std::begin(seed_data), std::end(seed_data));

std::mt19937 eng(seq);

This ensures that the entire state is randomized. 这确保整个状态是随机的。 Also, each engine specifies how much data it reads from the seed_sequence so you may want to read the docs to find that info for whatever engine you use. 此外,每个引擎都指定从seed_sequence读取的数据量,因此您可能需要阅读文档以查找您使用的任何引擎的信息。

Although here I load up the seed_seq entirely from std::random_device , seed_seq is specified such that just a few numbers that aren't particularly random should work well. 虽然在这里我完全从std::random_deviceseed_seq指定seed_seq使得只有少数不是特别随机的数字应该可以正常工作。 For example: 例如:

std::seed_seq seq{1, 2, 3, 4, 5};
std::mt19937 eng(seq);

In the comments below Cubbi indicates that seed_seq works by performing a warmup sequence for you. 在下面的评论中,Cubbi表示seed_seq工作原理是为您执行预热序列。

Here's what should be your 'default' for seeding: 这是播种的“默认”:

std::random_device r;
std::seed_seq seed{r(), r(), r(), r(), r(), r(), r(), r()};
std::mt19937 rng(seed);

If you seed with just one 32-bit value, all you will ever get is one of the same 2^32 trajectories through state-space. 如果只使用一个32位值进行播种,那么您将获得的是通过状态空间相同的2 ^ 32个轨迹之一。 If you use a PRNG with KiBs of state, then you should probably seed all of it. 如果你使用具有状态KiBs的PRNG,那么你应该将所有这些都归结为种子。 As described in the comments to @bames63' answer, using std::seed_seq is probably not a good idea if you want to init the whole state with random numbers. 正如对@ bames63'回答的评论中所描述的那样,如果你想用随机数初始化整个状态,使用std::seed_seq可能不是一个好主意。 Sadly, std::random_device does not conform to the SeedSequence concept, but you can write a wrapper that does: 遗憾的是, std::random_device不符合SeedSequence概念,但您可以编写一个包装器:

#include <random>
#include <iostream>
#include <algorithm>
#include <functional>

class random_device_wrapper {
    std::random_device *m_dev;
public:
    using result_type = std::random_device::result_type;
    explicit random_device_wrapper(std::random_device &dev) : m_dev(&dev) {}
    template <typename RandomAccessIterator>
    void generate(RandomAccessIterator first, RandomAccessIterator last) {
        std::generate(first, last, std::ref(*m_dev));
  }
};

int main() {

    auto rd = std::random_device{};
    auto seedseq = random_device_wrapper{rd};
    auto mt = std::mt19937{seedseq};
    for (auto i = 100; i; --i)
        std::cout << mt() << std::endl;

}

This works at least until you enable concepts. 这至少在您启用概念之前有效。 Depending on whether your compiler knows about SeedSequence as a C++20 concept , it may fail to work because we're supplying only the missing generate() method, nothing else. 根据您的编译器是否知道SeedSequence作为C ++ 20 concept ,它可能无法工作,因为我们只提供缺少的generate()方法,没有别的。 In duck-typed template programming, that code is sufficient, though, because the PRNG does not store the seed sequence object. 在duck-typed模板编程中,该代码就足够了,因为PRNG不存储种子序列对象。

I believe there are situations where MT can be seeded "poorly" which results in non-optimal sequences. 我相信有些情况下,MT可能会“播种不良”,导致非最佳序列。 If I remember correctly, seeding with all zeroes is one such case. 如果我没记错的话,用全零播种就是这种情况。 I would recommend you try to use the WELL generators if this is a serious issue for you. 如果这是一个严重的问题,我建议您尝试使用WELL生成器。 I believe they are more flexible - the quality of the seed does not matter as much. 我相信它们更灵活 - 种子的质量并不重要。 (Perhaps to answer your question more directly: it's probably more efficient to focus on seeding well as opposed to seeding poorly then trying to generate a bunch of samples to get the generator to an optimal state.) (也许更直接地回答你的问题:专注于种子播种可能更有效,而不是播种效果差,然后尝试生成一堆样本以使发生器达到最佳状态。)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM