Why use variate_generator when I can just pass the RNG to the distribution? (C++ and Boost, specifically)

Question

Why should I do something like:

#include <boost/random.hpp>
#include <ctime>

using namespace boost;

double SampleNormal (double mean, double sigma)
{
    static mt19937 rng(static_cast<unsigned> (std::time(0)));
    normal_distribution<double> norm_dist(mean, sigma);
    variate_generator<mt19937&, normal_distribution<double> >  normal_sampler(rng,         norm_dist);
    return normal_sampler();
}

when it seems to me that the code:

#include <boost/random.hpp>
#include <ctime>

using namespace boost;

double SampleNormal (double mean, double sigma)
{
    static mt19937 rng(static_cast<unsigned> (std::time(0)));
    normal_distribution<double> norm_dist(mean, sigma);
    return norm_dist(rng);
}

should work just as well.

Why use variate_generator? Does it do something more than what is done in the second example?

A bit of background: I'm running 100 instantiations of a simulation involving 10^7 iterations of a loop where a random process takes place. This means I need really good random numbers.

Answer 1

There is no need to use the variate_generator and both code samples are fine.

The variate_generator is just there for convenience, such that you don't need to call norm_dist(rng) with the rng argument every time you need a new number.

If you construct a variate_generator<mt19937&, normal_distribution<double> > normal_distr_rnd_num , you can just call normal_distr_rnd_num() each time you want to get a new number. This may make the code more readable in some cases.

I'm not sure what you are trying to do with the SampleNormal(double mean, double sigma) function. If you call the function very often with the very same values for (mean,sigma) it might be worth to construct such a variate_generator object - let's call it sample_normal - and then just call sample_normal() instead of your function.

A comment on the quality of the numbers:
The quality of the numbers depends mostly on the underlying pseudo random number algorithm, ie the generator you choose. mt19337 has a period of 2^19937-1, which should suffice for 10^7 = 2^24(roughly) numbers, and no "obvious" correlation between two consecutive numbers. However, the numbers are still produced by a single deterministic algorithm. It is possible, however unlikely, that your application might be just the test to prove this determinism. So you could also vary the pseudo random number generator to see if your application gives the same results with an entirely different way of generating pseudo random numbers.

I'm more concerned about the initialization (seeding) of the generators. If you run 100 instances, it is tempting to do this in parallel. Now, if you run some of them in parallel, two instances may start at the same time. Since you initialize the generator with time() , these two instances will be seeded with the very same number. Hence both instances will use the exact same sequence of random numbers.

In scientific applications it is good practice to either manually seed the random number generator (to ensure they were initialized with different seeds) or to at least record/log the used seed. This way you are able to reproduce the sequence of pseudo random numbers and therefore the result of your program.

Why use variate_generator when I can just pass the RNG to the distribution? (C++ and Boost, specifically)

Question

1 answers

solution1
1 2013-08-16 08:07:30

Why use variate_generator when I can just pass the RNG to the distribution? (C++ and Boost, specifically)

Question

1 answers

solution1 1 2013-08-16 08:07:30

solution1
1 2013-08-16 08:07:30