简体   繁体   English

在 C++ 的并行 PRNG 中取得领先

[英]Jumping ahead in parallelised PRNGs in C++

I am implementing a Monte Carlo simulation, where I need to run multiple realisations of some dynamics and then take an average over the end state for all the simulations.我正在实施蒙特卡洛模拟,我需要运行一些动力学的多个实现,然后对所有模拟的最后 state 取平均值。 Since the number of realisation is large, I run them in parallel using OpenMP.由于实现的数量很大,我使用 OpenMP 并行运行它们。 Every realisation starts from the same initial conditions and then at each time step a process happens with a given probability, and to determine which process I draw a random number from a uniform distribution.每个实现都从相同的初始条件开始,然后在每个时间步以给定的概率发生一个过程,并确定我从均匀分布中抽取随机数的过程。

I want to make sure that all simulations are statistically independent and that there is no overlap in the random numbers that are being drawn.我想确保所有模拟在统计上都是独立的,并且所绘制的随机数没有重叠。

I use OpenMP to parallelise the for loops, so the skeleton code looks like this:我使用 OpenMP 来并行化 for 循环,因此框架代码如下所示:

vector<int> data(number_of_sims);
double t;
double r;
#pragma omp parallel for
for(int i = 0; i < number_of_sims; i++){

    // run sim
    t = 0;
    while (t < T) {
        r = draw_random_uniform();
        if (r < p) do_something();
        else do_something_else();
        t += 1.0;  // increment time
    }

    // some calculation
    data[i] = calculate();
}

So every time I want a random number, I would call a function which used the Mersenne Twister seeded with random device.因此,每次我想要一个随机数时,我都会调用一个 function,它使用带有随机设备的 Mersenne Twister。

double draw_random_uniform(){
   static thread_local auto seed = std::random_device{}();
   static thread_local mt19937 mt(seed);
   std::uniform_real_distribution<double> distribution(0.0, 1.0);
   double r = distribution(mt);
   return r;
}

However, since I ultimately want to run this code on a high power computing cluster I want to avoid using std::random_device() as it is risky for systems with little entropy.但是,由于我最终想在高性能计算集群上运行此代码,因此我想避免使用std::random_device()因为它对于熵很小的系统来说是有风险的。

So instead I want to create an initial random number generator and then jump it forward a large amount for each of the threads.因此,我想创建一个初始随机数生成器,然后为每个线程将其向前大量跳转。 I have been attempting to do this with the Xoroshiro256+ PRNG (I found some good implementation here: https://github.com/Reputeless/Xoshiro-cpp ).我一直在尝试使用 Xoroshiro256+ PRNG 执行此操作(我在这里找到了一些很好的实现: https://github.com/Reputeless/Xoshiro-cpp )。 Something like this for example:例如这样的事情:

XoshiroCpp::Xoshiro256Plus prng(42);  // properly seeded prng
#pragma omp parallel num_threads()
{
    static thread_local XoshiroCpp::Xoshiro256Plus lprng(prng);  // thread local copy
    lprng.longJump();  // jump ahead

    // code as before, except use lprng to generate random numbers
    # pragma omp for
    ....
}

However, I cannot get such an implementation to work.但是,我无法让这样的实现工作。 I suspect because of the double OpenMP for loops.我怀疑是因为双 OpenMP for 循环。 I had the thought of pre-generating all of the PNRGs and storing in a container, then accessing the relevant one by using omp_get_thread_num() inside the parallelised for loop.我想到了预先生成所有 PNRG 并存储在一个容器中,然后通过在并行化 for 循环内使用omp_get_thread_num()来访问相关的一个。

I am unsure if this is the best way to go about doing all this.我不确定这是否是 go 做这一切的最佳方式。 Any advice is appreciated.任何建议表示赞赏。

Coordinating random number generators with long jump can be tricky.将随机数生成器与跳远协调起来可能很棘手。 Alternatively there is a much simpler method.或者,还有一种更简单的方法。

Here is a quote from the authors website :这是作者网站上的引述:

It is however important that the period is long enough.然而,重要的是该时间段足够长。 Moreover, if you run n independent computations starting at random seeds, the sequences used by each computation should not overlap.此外,如果您从随机种子开始运行n独立计算,则每个计算使用的序列不应重叠。

Now, given a generator with period P , the probability that现在,给定一个周期为P的发电机,概率
subsequences of length L starting at random points in the state space overlap is bounded by n² L/P .从 state 空间重叠中的随机点开始的长度L的子序列以n² L/P为界。 If your generator has period 2^256 and you run on 2^64 cores (you will never have them) a computation using 2^64 pseudorandom numbers (you will never have the time) the probability of overlap would be less than 2^-64 .如果您的生成器的周期为2^256并且您在2^64内核上运行(您永远不会拥有它们)使用2^64伪随机数进行计算(您永远不会有时间)重叠的概率将小于2^-64 .

So instead of trying to coordinate, you could in each thread just randomly seed a new generator from std::random_device{} .因此,您可以在每个线程中从std::random_device{}随机播种一个新生成器,而不是尝试协调。 The period is so large that it will not collide.周期很大,不会碰撞。

While this sounds like a very add-hock approach, this random-seeding method is actually a widely used and classic method.虽然这听起来像是一种非常附加的方法,但这种random-seeding method实际上是一种广泛使用的经典方法。

You just need to make sure the seeds are different.您只需要确保种子不同即可。 Depending on the platform usually different random seeds are proposed.根据平台的不同,通常会提出不同的随机种子

  • Using a truly random source使用真正的随机源
  • Having an atomic int that is incremented and some hashing有一个递增的原子整数和一些散列
  • Using another pseudo random number generator to generate a seed sequence使用另一个伪随机数生成器生成种子序列
  • Using a combination of thread id and time to create a seed使用线程 ID 和时间的组合来创建种子

If repeatability is not needed, seeds from a random source is the most easiest and safest solution.如果不需要重复性,随机来源的种子是最简单和最安全的解决方案。

The paper from L'Ecuyer et. L'Ecuyer 等人的论文。 al. 阿尔。 from 2017 gives a good overview of methods for generating parallel streams. 从 2017 年开始,很好地概述了生成并行流的方法。 He calls this approach "RNG with a “random” seed for each stream` under chapter 4.他在第 4 章中将这种方法称为“RNG,每个流都有一个‘随机’种子”。

vector<int> data(number_of_sims);
double t;
double r;
#pragma omp parallel for
for(int i = 0; i < number_of_sims; i++){
    // random 128 bit seed
    auto rd = std::random_device{};
    auto seed = std::seed_seq {rd(), rd(), rd(), rd()};
    auto mt = std::mt19937 {seed};

    // run sim
    t = 0;
    while (t < T) {
        r = draw_random_uniform(mt);
        if (r < p) do_something();
        else do_something_else();
        t += 1.0;  // increment time
    }

    // some calculation
    data[i] = calculate();
}

and

double draw_random_uniform(mt19937 &mt){
   std::uniform_real_distribution<double> distribution(0.0, 1.0);
   return distribution(mt);
}

If number_of_sims is not extremely large there is no need for static or thread_local initialization.如果number_of_sims不是很大,则不需要 static 或 thread_local 初始化。

You should read "Parallel Random Numbers, as easy as one, two three" http://www.thesalmons.org/john/random123/papers/random123sc11.pdf This paper explicitly addresses your forward stepping issues.您应该阅读“平行随机数,就像一、二、三一样简单” http://www.thesalmons.org/john/random123/papers/random123sc11.pdf本文明确解决了您的前进问题。 You can now find implementations of this generator in maths libraries (such as Intel's MKL, which uses the specialized encryption instructions, so will be hard to beat by hand!)您现在可以在数学库中找到此生成器的实现(例如 Intel 的 MKL,它使用专门的加密指令,因此很难用手击败!)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM