在 C++ 的并行 PRNG 中取得领先

Question

I am implementing a Monte Carlo simulation, where I need to run multiple realisations of some dynamics and then take an average over the end state for all the simulations.我正在实施蒙特卡洛模拟，我需要运行一些动力学的多个实现，然后对所有模拟的最后 state 取平均值。 Since the number of realisation is large, I run them in parallel using OpenMP.由于实现的数量很大，我使用 OpenMP 并行运行它们。 Every realisation starts from the same initial conditions and then at each time step a process happens with a given probability, and to determine which process I draw a random number from a uniform distribution.每个实现都从相同的初始条件开始，然后在每个时间步以给定的概率发生一个过程，并确定我从均匀分布中抽取随机数的过程。

I want to make sure that all simulations are statistically independent and that there is no overlap in the random numbers that are being drawn.我想确保所有模拟在统计上都是独立的，并且所绘制的随机数没有重叠。

I use OpenMP to parallelise the for loops, so the skeleton code looks like this:我使用 OpenMP 来并行化 for 循环，因此框架代码如下所示：

vector<int> data(number_of_sims);
double t;
double r;
#pragma omp parallel for
for(int i = 0; i < number_of_sims; i++){

    // run sim
    t = 0;
    while (t < T) {
        r = draw_random_uniform();
        if (r < p) do_something();
        else do_something_else();
        t += 1.0;  // increment time
    }

    // some calculation
    data[i] = calculate();
}

So every time I want a random number, I would call a function which used the Mersenne Twister seeded with random device.因此，每次我想要一个随机数时，我都会调用一个 function，它使用带有随机设备的 Mersenne Twister。

double draw_random_uniform(){
   static thread_local auto seed = std::random_device{}();
   static thread_local mt19937 mt(seed);
   std::uniform_real_distribution<double> distribution(0.0, 1.0);
   double r = distribution(mt);
   return r;
}

However, since I ultimately want to run this code on a high power computing cluster I want to avoid using std::random_device() as it is risky for systems with little entropy.但是，由于我最终想在高性能计算集群上运行此代码，因此我想避免使用std::random_device()因为它对于熵很小的系统来说是有风险的。

So instead I want to create an initial random number generator and then jump it forward a large amount for each of the threads.因此，我想创建一个初始随机数生成器，然后为每个线程将其向前大量跳转。 I have been attempting to do this with the Xoroshiro256+ PRNG (I found some good implementation here: https://github.com/Reputeless/Xoshiro-cpp ).我一直在尝试使用 Xoroshiro256+ PRNG 执行此操作（我在这里找到了一些很好的实现： https://github.com/Reputeless/Xoshiro-cpp ）。 Something like this for example:例如这样的事情：

XoshiroCpp::Xoshiro256Plus prng(42);  // properly seeded prng
#pragma omp parallel num_threads()
{
    static thread_local XoshiroCpp::Xoshiro256Plus lprng(prng);  // thread local copy
    lprng.longJump();  // jump ahead

    // code as before, except use lprng to generate random numbers
    # pragma omp for
    ....
}

However, I cannot get such an implementation to work.但是，我无法让这样的实现工作。 I suspect because of the double OpenMP for loops.我怀疑是因为双 OpenMP for 循环。 I had the thought of pre-generating all of the PNRGs and storing in a container, then accessing the relevant one by using omp_get_thread_num() inside the parallelised for loop.我想到了预先生成所有 PNRG 并存储在一个容器中，然后通过在并行化 for 循环内使用omp_get_thread_num()来访问相关的一个。

I am unsure if this is the best way to go about doing all this.我不确定这是否是 go 做这一切的最佳方式。 Any advice is appreciated.任何建议表示赞赏。

Answer 1

Coordinating random number generators with long jump can be tricky.将随机数生成器与跳远协调起来可能很棘手。 Alternatively there is a much simpler method.或者，还有一种更简单的方法。

Here is a quote from the authors website :这是作者网站上的引述：

It is however important that the period is long enough.然而，重要的是该时间段足够长。 Moreover, if you run n independent computations starting at random seeds, the sequences used by each computation should not overlap.此外，如果您从随机种子开始运行n独立计算，则每个计算使用的序列不应重叠。

Now, given a generator with period P , the probability that现在，给定一个周期为P的发电机，概率
subsequences of length L starting at random points in the state space overlap is bounded by n² L/P .从 state 空间重叠中的随机点开始的长度L的子序列以n² L/P为界。 If your generator has period 2^256 and you run on 2^64 cores (you will never have them) a computation using 2^64 pseudorandom numbers (you will never have the time) the probability of overlap would be less than 2^-64 .如果您的生成器的周期为2^256并且您在2^64内核上运行（您永远不会拥有它们）使用2^64伪随机数进行计算（您永远不会有时间）重叠的概率将小于2^-64 .

So instead of trying to coordinate, you could in each thread just randomly seed a new generator from std::random_device{} .因此，您可以在每个线程中从std::random_device{}随机播种一个新生成器，而不是尝试协调。 The period is so large that it will not collide.周期很大，不会碰撞。

While this sounds like a very add-hock approach, this random-seeding method is actually a widely used and classic method.虽然这听起来像是一种非常附加的方法，但这种random-seeding method实际上是一种广泛使用的经典方法。

You just need to make sure the seeds are different.您只需要确保种子不同即可。 Depending on the platform usually different random seeds are proposed.根据平台的不同，通常会提出不同的随机种子。

Using a truly random source使用真正的随机源
Having an atomic int that is incremented and some hashing有一个递增的原子整数和一些散列
Using another pseudo random number generator to generate a seed sequence使用另一个伪随机数生成器生成种子序列
Using a combination of thread id and time to create a seed使用线程 ID 和时间的组合来创建种子

If repeatability is not needed, seeds from a random source is the most easiest and safest solution.如果不需要重复性，随机来源的种子是最简单和最安全的解决方案。

The paper from L'Ecuyer et. L'Ecuyer 等人的论文。 al. 阿尔。 from 2017 gives a good overview of methods for generating parallel streams. 从 2017 年开始，很好地概述了生成并行流的方法。 He calls this approach "RNG with a “random” seed for each stream` under chapter 4.他在第 4 章中将这种方法称为“RNG，每个流都有一个‘随机’种子”。

vector<int> data(number_of_sims);
double t;
double r;
#pragma omp parallel for
for(int i = 0; i < number_of_sims; i++){
    // random 128 bit seed
    auto rd = std::random_device{};
    auto seed = std::seed_seq {rd(), rd(), rd(), rd()};
    auto mt = std::mt19937 {seed};

    // run sim
    t = 0;
    while (t < T) {
        r = draw_random_uniform(mt);
        if (r < p) do_something();
        else do_something_else();
        t += 1.0;  // increment time
    }

    // some calculation
    data[i] = calculate();
}

and和

double draw_random_uniform(mt19937 &mt){
   std::uniform_real_distribution<double> distribution(0.0, 1.0);
   return distribution(mt);
}

If number_of_sims is not extremely large there is no need for static or thread_local initialization.如果number_of_sims不是很大，则不需要 static 或 thread_local 初始化。

Answer 2

You should read "Parallel Random Numbers, as easy as one, two three" http://www.thesalmons.org/john/random123/papers/random123sc11.pdf This paper explicitly addresses your forward stepping issues.您应该阅读“平行随机数，就像一、二、三一样简单” http://www.thesalmons.org/john/random123/papers/random123sc11.pdf本文明确解决了您的前进问题。 You can now find implementations of this generator in maths libraries (such as Intel's MKL, which uses the specialized encryption instructions, so will be hard to beat by hand!)您现在可以在数学库中找到此生成器的实现（例如 Intel 的 MKL，它使用专门的加密指令，因此很难用手击败！）

在 C++ 的并行 PRNG 中取得领先

问题描述

2 个解决方案

解决方案1
1 2023-01-31 14:33:55

解决方案2
0 2023-01-31 16:10:26

在 C++ 的并行 PRNG 中取得领先

问题描述

2 个解决方案

解决方案1 1 2023-01-31 14:33:55

解决方案2 0 2023-01-31 16:10:26

解决方案1
1 2023-01-31 14:33:55

解决方案2
0 2023-01-31 16:10:26