简体   繁体   English

为什么数字1,2和3使用C rand()函数经常出现?

[英]Why do digits 1, 2 and 3 appear so frequently using C rand() function?

What I am trying to do is to generate some random numbers (not necessarily single digit) like 我想要做的是生成一些随机数(不一定是单个数字)

29106
7438
5646
4487
9374
28671
92
13941
25226
10076

and then count the number of digits I get: 然后计算我得到的位数:

count[0] =       3  Percentage =  6.82
count[1] =       5  Percentage = 11.36
count[2] =       6  Percentage = 13.64
count[3] =       3  Percentage =  6.82
count[4] =       6  Percentage = 13.64
count[5] =       2  Percentage =  4.55
count[6] =       7  Percentage = 15.91
count[7] =       5  Percentage = 11.36
count[8] =       3  Percentage =  6.82
count[9] =       4  Percentage =  9.09

This is the code I am using: 这是我正在使用的代码:

#include <stdio.h>
#include <time.h>
#include <stdlib.h>

int main() {

    int i;
    srand(time(NULL));
    FILE* fp = fopen("random.txt", "w");    
    // for(i = 0; i < 10; i++)
    for(i = 0; i < 1000000; i++)
        fprintf(fp, "%d\n", rand());
    fclose(fp);

    int dummy;
    long count[10] = {0,0,0,0,0,0,0,0,0,0};
    fp = fopen("random.txt", "r");
    while(!feof(fp)) {
        fscanf(fp, "%1d", &dummy);
        count[dummy]++;                 
    }
    fclose(fp);

    long sum = 0;
    for(i = 0; i < 10; i++)
        sum += count[i];

    for(i = 0; i < 10; i++)
        printf("count[%d] = %7ld  Percentage = %5.2f\n",
            i, count[i], ((float)(100 * count[i])/sum));

}

If I generate a large number of random numbers (1000000), this is the result I get: 如果我生成大量随机数(1000000),这是我得到的结果:

count[0] =  387432  Percentage =  8.31
count[1] =  728339  Percentage = 15.63
count[2] =  720880  Percentage = 15.47
count[3] =  475982  Percentage = 10.21
count[4] =  392678  Percentage =  8.43
count[5] =  392683  Percentage =  8.43
count[6] =  392456  Percentage =  8.42
count[7] =  391599  Percentage =  8.40
count[8] =  388795  Percentage =  8.34
count[9] =  389501  Percentage =  8.36

Notice that 1, 2 and 3 have too many hits. 请注意,1,2和3的命中次数太多。 I have tried running this several times and each time I get very similar results. 我尝试过多次运行,每次都得到非常相似的结果。

I am trying to understand what could cause 1, 2 and 3 to appear much more frequently than any other digit. 我试图理解什么可能导致1,2和3比任何其他数字更频繁地出现。


Taking hint from what Matt Joiner and Pascal Cuoq pointed out, 从Matt Joiner和Pascal Cuoq所指出的暗示,

I changed the code to use 我更改了要使用的代码

for(i = 0; i < 1000000; i++)
    fprintf(fp, "%04d\n", rand() % 10000);
// pretty prints 0
// generates numbers in range 0000 to 9999

and this is what I get (similar results on multiple runs): 这就是我得到的(多次运行时类似的结果):

count[0] =  422947  Percentage = 10.57
count[1] =  423222  Percentage = 10.58
count[2] =  414699  Percentage = 10.37
count[3] =  391604  Percentage =  9.79
count[4] =  392640  Percentage =  9.82
count[5] =  392928  Percentage =  9.82
count[6] =  392737  Percentage =  9.82
count[7] =  392634  Percentage =  9.82
count[8] =  388238  Percentage =  9.71
count[9] =  388352  Percentage =  9.71

What can be the reason that 0, 1 and 2 are favored? 0,1和2受到青睐的原因是什么?


Thanks everyone. 感谢大家。 Using 运用

int rand2(){
    int num = rand();
    return (num > 30000? rand2():num);     
}

    fprintf(fp, "%04d\n", rand2() % 10000);

I get 我明白了

count[0] =  399629  Percentage =  9.99
count[1] =  399897  Percentage = 10.00
count[2] =  400162  Percentage = 10.00
count[3] =  400412  Percentage = 10.01
count[4] =  399863  Percentage = 10.00
count[5] =  400756  Percentage = 10.02
count[6] =  399980  Percentage = 10.00
count[7] =  400055  Percentage = 10.00
count[8] =  399143  Percentage =  9.98
count[9] =  400104  Percentage = 10.00

rand() generates a value from 0 to RAND_MAX . rand()生成一个从0RAND_MAX的值。 RAND_MAX is set to INT_MAX on most platforms, which may be 32767 or 2147483647 . 在大多数平台上, RAND_MAX设置为INT_MAX ,可以是327672147483647

For your example given above, it appears that RAND_MAX is 32767 . 对于上面给出的示例,看起来RAND_MAX32767 This will place an unusually high frequency of 1 , 2 and 3 for the most significant digit for the values from 10000 to 32767 . 这将会把异常高的频率123为用于从值最显著数字1000032767 You can observe that to a lesser degree, values up to 6 and 7 will also be slightly favored. 你可以观察到,在较小的程度上,最多67值也会略微受到青睐。

Regarding the edited question, 关于编辑的问题,

This is because the digits are still not uniformly distributed even if you % 10000 . 这是因为即使您是% 10000 ,数字仍然不均匀分布。 Assume RAND_MAX == 32767 , and rand() is perfectly uniform. 假设RAND_MAX == 32767 ,并且rand()完全一致。

For every 10,000 numbers counting from 0, all of the digits will appear uniformly (4,000 each). 对于从0开始计数的每10,000个数字,所有数字将均匀显示(每个4,000个)。 However, 32,767 is not divisible by 10,000. 但是,32,767不能被10,000整除。 Therefore, these 2,768 numbers will provide more leading 0, 1 and 2's to the final count. 因此,这些2,768个数字将为最终计数提供更多前导0,1和2。

The exact contribution from these 2,768 numbers are: 这2,768个数字的确切贡献是:

digits count
0      1857
1      1857
2      1625
3      857
4      857
5      857
6      855
7      815
8      746
9      746

adding 12,000 for the initial 30,000 numbers to the count, then divide by the total number of digits (4×32,768) should give you the expected distribution: 将最初的30,000个数字加12,000加到计数中,然后除以总数位数(4×32,768),可以得到预期的分布:

number  probability (%)
0       10.5721
1       10.5721
2       10.3951
3        9.80911
4        9.80911
5        9.80911
6        9.80759
7        9.77707
8        9.72443
9        9.72443

which is close to what you get. 这是接近你得到的。

If you want to truly uniform digit distribution, you need to reject those 2,768 numbers: 如果您想要真正统一的数字分布,您需要拒绝这2,768个数字:

int rand_4digits() {
  const int RAND_MAX_4_DIGITS = RAND_MAX - RAND_MAX % 10000;
  int res;
  do {
    res = rand();
  } while (res >= RAND_MAX_4_DIGITS);
  return res % 10000;
}

看起来像本福德定律 - 见http://en.wikipedia.org/wiki/Benford%27s_law ,或者是一个不太好的RNG。

That's because you generate numbers between 0 and RAND_MAX . 那是因为你生成0RAND_MAX之间的数字。 The generated numbers are evenly distributed (ie approx. same probability for each number), however, the digits 1,2,3 occur more often than others in this range. 生成的数字是均匀分布的(即每个数字的概率大致相同),但是,数字1,2,3比该范围内的其他数字更频繁地出现。 Try generating between 0 and 10 , where each digit occurs with the same probability and you'll get a nice distribution. 尝试在010之间生成,其中每个数字以相同的概率出现,您将得到一个很好的分布。

If I understand what the OP (person asking the question) wants, they want to make better random numbers. 如果我理解OP(提出问题的人)想要什么,他们想要制作更好的随机数。

rand() and random(), quite frankly, don't make very good random numbers; rand()和random(),坦率地说,不要做很好的随机数; they both do poorly when tested against diehard and dieharder (two packages for testing the quality of random numbers). 当他们对死硬和顽固分子(两个用于测试随机数质量的软件包)进行测试时,它们都表现不佳。

The Mersenne twister is a popular random number generator which is good for pretty much everything except crypto-strong random numbers; Mersenne twister是一个流行的随机数生成器,除了加密的随机数之外几乎所有的东西都很好; it passes all of the diehard(er) tests with flying colors. 它以绚丽的色彩通过了所有的死硬测试。

If one needs crypto-strong random numbers (numbers that can not be guessed, even if someone knows which particular crypto-strong algorithm is being used), there are a number of stream ciphers out there. 如果需要加密强的随机数(无法猜到的数字,即使有人知道正在使用哪种特定的加密算法),那里有许多流密码。 The one I like to use is called RadioGatún[32], and here's a compact C representation of it: 我喜欢使用的那个叫做RadioGatún[32],这里有一个紧凑的C表示:

/*Placed in the public domain by Sam Trenholme*/
#include <stdint.h>
#include <stdio.h> 
#define p uint32_t
#define f(a) for(c=0;c<a;c++)
#define n f(3){b[c*13]^=s[c];a[16+c]^=s[c];}k(a,b 
k(p *a,p *b){p A[19],x,y,r,q[3],c,i;f(3){q[c]=b[c
*13+12];}for(i=12;i;i--){f(3){b[c*13+i]=b[c*13+i- 
1];}}f(3){b[c*13]=q[c];}f(12){i=c+1+((c%3)*13);b[
i]^=a[c+1];}f(19){y=(c*7)%19;r=((c*c+c)/2)%32;x=a
[y]^(a[(y+1)%19]|(~a[(y+2)%19]));A[c]=(x>>r)|(x<<
(32-r));}f(19){a[c]=A[c]^A[(c+1)%19]^A[(c+4)%19];
}a[0]^=1;f(3){a[c+13]^=q[c];}}l(p *a,p *b,char *v
){p s[3],q,c,r,x,d=0;for(;;){f(3){s[c]=0;}for(r=0
;r<3;r++){for(q=0;q<4;q++){if(!(x=*v&255)){d=x=1;
}v++;s[r]|=x<<(q*8);if(d){n);return;}}}n);}}main(
int j,char **h){p a[39],b[39],c,e,g;if(j==2){f(39
){a[c]=b[c]=0;}l(a,b,h[1]);f(16){k(a,b);}f(4){k(a
,b);for(j=1;j<3;++j){g=a[j];for(e=4;e;e--){printf
("%02x",g&255);g>>=8;}}}printf("\n");}}

There are also a lot of other really good random number generators out there. 那里还有很多其他非常好的随机数发生器。

When you want to generate random value from range [0, x) , instead of doing rand()%x , you should apply formula x*((double)rand()/RAND_MAX) , which will give you nicely distributed random values. 如果要从范围[0, x)生成随机值,而不是执行rand()%x ,则应该应用公式x*((double)rand()/RAND_MAX) ,这将为您提供分布良好的随机值。

Say, RAND_MAX is equal to 15, so rand will give you integers from 0 to 15. When you use modulo operator to get random numbers from [0, 10) , values [0,5] will have higher frequency than [6,9] , because 3 == 3%10 == 13%10 . 再说了,RAND_MAX等于15,所以rand会给你的整数从0到15当您使用模运算符从中获取随机数[0, 10)[0,5]将具有高于频率[6,9] ,因为3 == 3%10 == 13%10

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM