I'm running the following program:
#include <iostream>
#include <vector>
#include <cmath>
#include <cstdlib>
#include <chrono>
using namespace std;
const int N = 200; // Number of tests.
const int M = 2000000; // Number of pseudo-random values generated per test.
const int VALS = 2; // Number of possible values (values from 0 to VALS-1).
const int ESP = M / VALS; // Expected number of appearances of each value per test.
int main() {
for (int i = 0; i < N; ++i) {
unsigned seed = chrono::system_clock::now().time_since_epoch().count();
srand(seed);
vector<int> hist(VALS, 0);
for (int j = 0; j < M; ++j) ++hist[rand() % VALS];
int Y = 0;
for (int j = 0; j < VALS; ++j) Y += abs(hist[j] - ESP);
cout << Y << endl;
}
}
This program performs N tests. In each test we generate M numbers between 0 and VALS-1 while we keep counting their appearances in a histogram. Finally, we accumulate in Y the errors, which correspond to the difference between each value of the histogram and the expected value. Since the numbers are generated randomly, each of them would ideally appear M/VALS times per test.
After running my program I analysed the resulting data (ie, the 200 values of Y) and I realised that some things where happening which I can not explain. I saw that, if the program is compiled with vc++ and given some N and VALS (N = 200 and VALS = 2 in this case), we get different data patterns for different values of M. For some tests the resulting data follows a normal distribution, and for some tests it doesn't. Moreover, this type of results seem to altern as M (the number of pseudo-random values generated in each test) increases:
As you can see, depending on the value of M the resulting data follows a normal distribution or otherwise follows a non-normal distribution (bimodal, dog food or kind of uniform) in which more extreme values of Y have greater presence.
This diversity of results doesn't occur if we compile the program with other C++ compilers (gcc and clang). In this case, it looks like we always obtain a half-normal distribution of Y values:
What are your thoughts on this? What is the explanation?
I carried out the tests through this online compiler: http://rextester.com/l/cpp_online_compiler_visual
The program will generate poorly distributed random numbers (not uniform, independent).
rand
is a notoriously poor one. %
to bring the numbers into range effectively discards all but the low-order bits. [edit] I just noticed const int ESP = M / VALS;
. You want a floating point number instead.
Try the code below and report back. Using the new <random> is a little tedious. Many people write some small library code to simplify its use.
#include <iostream>
#include <vector>
#include <cmath>
#include <random>
#include <chrono>
using namespace std;
const int N = 200; // Number of tests.
const int M = 2000000; // Number of pseudo-random values generated per test.
const int VALS = 2; // Number of possible values (values from 0 to VALS-1).
const double ESP = (1.0*M)/VALS; // Expected number of appearances of each value per test.
static std::default_random_engine engine;
static void seed() {
std::random_device rd;
engine.seed(rd());
}
static int rand_int(int lo, int hi) {
std::uniform_int_distribution<int> dist (lo, hi - 1);
return dist(engine);
}
int main() {
seed();
for (int i = 0; i < N; ++i) {
vector<int> hist(VALS, 0);
for (int j = 0; j < M; ++j) ++hist[rand_int(0, VALS)];
int Y = 0;
for (int j = 0; j < VALS; ++j) Y += abs(hist[j] - ESP);
cout << Y << endl;
}
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.