Jumping ahead in parallelised PRNGs in C++

Question

I am implementing a Monte Carlo simulation, where I need to run multiple realisations of some dynamics and then take an average over the end state for all the simulations. Since the number of realisation is large, I run them in parallel using OpenMP. Every realisation starts from the same initial conditions and then at each time step a process happens with a given probability, and to determine which process I draw a random number from a uniform distribution.

I want to make sure that all simulations are statistically independent and that there is no overlap in the random numbers that are being drawn.

I use OpenMP to parallelise the for loops, so the skeleton code looks like this:

vector<int> data(number_of_sims);
double t;
double r;
#pragma omp parallel for
for(int i = 0; i < number_of_sims; i++){

    // run sim
    t = 0;
    while (t < T) {
        r = draw_random_uniform();
        if (r < p) do_something();
        else do_something_else();
        t += 1.0;  // increment time
    }

    // some calculation
    data[i] = calculate();
}

So every time I want a random number, I would call a function which used the Mersenne Twister seeded with random device.

double draw_random_uniform(){
   static thread_local auto seed = std::random_device{}();
   static thread_local mt19937 mt(seed);
   std::uniform_real_distribution<double> distribution(0.0, 1.0);
   double r = distribution(mt);
   return r;
}

However, since I ultimately want to run this code on a high power computing cluster I want to avoid using std::random_device() as it is risky for systems with little entropy.

So instead I want to create an initial random number generator and then jump it forward a large amount for each of the threads. I have been attempting to do this with the Xoroshiro256+ PRNG (I found some good implementation here: https://github.com/Reputeless/Xoshiro-cpp ). Something like this for example:

XoshiroCpp::Xoshiro256Plus prng(42);  // properly seeded prng
#pragma omp parallel num_threads()
{
    static thread_local XoshiroCpp::Xoshiro256Plus lprng(prng);  // thread local copy
    lprng.longJump();  // jump ahead

    // code as before, except use lprng to generate random numbers
    # pragma omp for
    ....
}

However, I cannot get such an implementation to work. I suspect because of the double OpenMP for loops. I had the thought of pre-generating all of the PNRGs and storing in a container, then accessing the relevant one by using omp_get_thread_num() inside the parallelised for loop.

I am unsure if this is the best way to go about doing all this. Any advice is appreciated.

Answer 1

Coordinating random number generators with long jump can be tricky. Alternatively there is a much simpler method.

Here is a quote from the authors website :

It is however important that the period is long enough. Moreover, if you run n independent computations starting at random seeds, the sequences used by each computation should not overlap.

Now, given a generator with period P , the probability that
subsequences of length L starting at random points in the state space overlap is bounded by n² L/P . If your generator has period 2^256 and you run on 2^64 cores (you will never have them) a computation using 2^64 pseudorandom numbers (you will never have the time) the probability of overlap would be less than 2^-64 .

So instead of trying to coordinate, you could in each thread just randomly seed a new generator from std::random_device{} . The period is so large that it will not collide.

While this sounds like a very add-hock approach, this random-seeding method is actually a widely used and classic method.

You just need to make sure the seeds are different. Depending on the platform usually different random seeds are proposed.

Using a truly random source
Having an atomic int that is incremented and some hashing
Using another pseudo random number generator to generate a seed sequence
Using a combination of thread id and time to create a seed

If repeatability is not needed, seeds from a random source is the most easiest and safest solution.

The paper from L'Ecuyer et. al. from 2017 gives a good overview of methods for generating parallel streams. He calls this approach "RNG with a “random” seed for each stream` under chapter 4.

vector<int> data(number_of_sims);
double t;
double r;
#pragma omp parallel for
for(int i = 0; i < number_of_sims; i++){
    // random 128 bit seed
    auto rd = std::random_device{};
    auto seed = std::seed_seq {rd(), rd(), rd(), rd()};
    auto mt = std::mt19937 {seed};

    // run sim
    t = 0;
    while (t < T) {
        r = draw_random_uniform(mt);
        if (r < p) do_something();
        else do_something_else();
        t += 1.0;  // increment time
    }

    // some calculation
    data[i] = calculate();
}

and

double draw_random_uniform(mt19937 &mt){
   std::uniform_real_distribution<double> distribution(0.0, 1.0);
   return distribution(mt);
}

If number_of_sims is not extremely large there is no need for static or thread_local initialization.

Answer 2

You should read "Parallel Random Numbers, as easy as one, two three" http://www.thesalmons.org/john/random123/papers/random123sc11.pdf This paper explicitly addresses your forward stepping issues. You can now find implementations of this generator in maths libraries (such as Intel's MKL, which uses the specialized encryption instructions, so will be hard to beat by hand!)

Jumping ahead in parallelised PRNGs in C++

Question

2 answers

solution1
1 2023-01-31 14:33:55

solution2
0 2023-01-31 16:10:26

Jumping ahead in parallelised PRNGs in C++

Question

2 answers

solution1 1 2023-01-31 14:33:55

solution2 0 2023-01-31 16:10:26

solution1
1 2023-01-31 14:33:55

solution2
0 2023-01-31 16:10:26