简体   繁体   中英

Why does vc++ compiler cause this statistical pattern?

I'm running the following program:

#include <iostream>
#include <vector>
#include <cmath>
#include <cstdlib>
#include <chrono>
using namespace std;

const int N = 200;          // Number of tests.
const int M = 2000000;      // Number of pseudo-random values generated per test.
const int VALS = 2;         // Number of possible values (values from 0 to VALS-1).
const int ESP = M / VALS;   // Expected number of appearances of each value per test.

int main() {
    for (int i = 0; i < N; ++i) {
        unsigned seed = chrono::system_clock::now().time_since_epoch().count();
        srand(seed);
        vector<int> hist(VALS, 0);
        for (int j = 0; j < M; ++j) ++hist[rand() % VALS];
        int Y = 0;
        for (int j = 0; j < VALS; ++j) Y += abs(hist[j] - ESP);
        cout << Y << endl;
    }
}

This program performs N tests. In each test we generate M numbers between 0 and VALS-1 while we keep counting their appearances in a histogram. Finally, we accumulate in Y the errors, which correspond to the difference between each value of the histogram and the expected value. Since the numbers are generated randomly, each of them would ideally appear M/VALS times per test.

After running my program I analysed the resulting data (ie, the 200 values of Y) and I realised that some things where happening which I can not explain. I saw that, if the program is compiled with vc++ and given some N and VALS (N = 200 and VALS = 2 in this case), we get different data patterns for different values of M. For some tests the resulting data follows a normal distribution, and for some tests it doesn't. Moreover, this type of results seem to altern as M (the number of pseudo-random values generated in each test) increases:

  • M = 10K, data is not normal:

在此处输入图片说明 在此处输入图片说明

  • M = 100K, data is normal:

在此处输入图片说明 在此处输入图片说明

  • and so on:

在此处输入图片说明 在此处输入图片说明

在此处输入图片说明 在此处输入图片说明

在此处输入图片说明 在此处输入图片说明

As you can see, depending on the value of M the resulting data follows a normal distribution or otherwise follows a non-normal distribution (bimodal, dog food or kind of uniform) in which more extreme values of Y have greater presence.

This diversity of results doesn't occur if we compile the program with other C++ compilers (gcc and clang). In this case, it looks like we always obtain a half-normal distribution of Y values:

在此处输入图片说明 在此处输入图片说明

What are your thoughts on this? What is the explanation?

I carried out the tests through this online compiler: http://rextester.com/l/cpp_online_compiler_visual

The program will generate poorly distributed random numbers (not uniform, independent).

  1. The function rand is a notoriously poor one.
  2. The use of the remainder operator % to bring the numbers into range effectively discards all but the low-order bits.
  3. The RNG is re-seeded every time through the loop.

[edit] I just noticed const int ESP = M / VALS; . You want a floating point number instead.

Try the code below and report back. Using the new <random> is a little tedious. Many people write some small library code to simplify its use.

#include <iostream>
#include <vector>
#include <cmath>
#include <random>
#include <chrono>
using namespace std;

const int N = 200;          // Number of tests.
const int M = 2000000;      // Number of pseudo-random values generated per test.
const int VALS = 2;         // Number of possible values (values from 0 to VALS-1).
const double ESP = (1.0*M)/VALS; // Expected number of appearances of each value per test.

static std::default_random_engine engine;

static void seed() {
    std::random_device rd;
    engine.seed(rd());
}
static int rand_int(int lo, int hi) {
    std::uniform_int_distribution<int> dist (lo, hi - 1);
    return dist(engine);
}
int main() {
    seed();
    for (int i = 0; i < N; ++i) {
        vector<int> hist(VALS, 0);
        for (int j = 0; j < M; ++j) ++hist[rand_int(0, VALS)];
        int Y = 0;
        for (int j = 0; j < VALS; ++j) Y += abs(hist[j] - ESP);
        cout << Y << endl;
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM