简体   繁体   English

随机数位数的分布

[英]Distribution of Number of Digits of Random Numbers

I encounter this curious phenomenon trying to implement a UUID generator in JavaScript. 我在尝试用JavaScript实现UUID生成器时遇到了这种奇怪的现象。

Basically, in JavaScript, if I generate a large list of random numbers with the built-in Math.random() on Node 4.2.2 : 基本上,在JavaScript中,如果我使用节点4.2.2上的内置Math.random()生成大量随机数:

var records = {};
var l;
for (var i=0; i < 1e6; i += 1) {
  l = String(Math.random()).length;
  if (records[l]) {
    records[l] += 1;
  } else {
    records[l] = 1;
  }
}
console.log(records);

The numbers of digits have a strange pattern: 数字位数有一个奇怪的模式:

{ '12': 1,
  '13': 11,
  '14': 65,
  '15': 663,
  '16': 6619,
  '17': 66378,
  '18': 611441,
  '19': 281175,
  '20': 30379,
  '21': 2939,
  '22': 282,
  '23': 44,
  '24': 3 }

I thought this is a quirk of the random number generator of V8, but similar pattern appears in Python 3.4.3 : 我认为这是V8随机数生成器的一个怪癖,但Python 3.4.3出现了类似的模式:

12 : 2
13 : 5
14 : 64
15 : 672
16 : 6736
17 : 66861
18 : 610907
19 : 280945
20 : 30455
21 : 3129
22 : 224

And the Python code is as follows: Python代码如下:

import random
random.seed()
records = {}
for i in range(0, 1000000):
    n = random.random()
    l = len(str(n))
    try:
        records[l] += 1
    except KeyError:
        records[l] = 1;

for i in sorted(records):
    print(i, ':', records[i])

The pattern from 18 to below is expected: say if random number should have 20 digits, then if the last digit of a number is 0, it effectively has only 19 digits. 预期从18到以下的模式:如果随机数应该有20位,那么如果数字的最后一位是0,它实际上只有19位数。 If the random number generator is good, the probability of that happening is roughly 1/10. 如果随机数发生器是好的,那么发生这种情况的概率大约是1/10。

But why the pattern is reversed for 19 and beyond? 但是为什么这个模式在19岁及以后都被逆转了?

I guess this is related to float numbers' binary representation, but I can't figure out exactly why. 我想这与浮点数的二进制表示有关,但我无法弄明白为什么。

The reason is indeed related to floating point representation. 原因确实与浮点表示有关。 A floating point number representation has a maximum number of (binary) digits it can represent, and a limited exponent value range. 浮点数表示具有它可以表示的最大(二进制)数字数和有限的指数值范围。 Now when you print this out without using scientific notation, you might in some cases need to have some zeroes after the decimal point before the significant digits start to follow. 现在,当您在不使用科学记数法的情况下打印出来时,在某些情况下,您可能需要在有效数字开始跟随之前的小数点后面有一些零。

You can visualize this effect by printing those random numbers which have the longest length when converted to string : 您可以通过打印转换为string时长度最长的随机数来可视化此效果:

var records = {};
var l, r;
for (var i=0; i < 1e6; i += 1) {
    r = Math.random();
    l = String(r).length;
    if (l === 23) {
        console.log(r);
    }
    if (records[l]) {
        records[l] += 1;
    } else {
        records[l] = 1;
    }
}

This prints only the 23-long strings, and you will get numbers like these: 这只会打印23个长的字符串,你会得到这样的数字:

0.000007411070483631654
0.000053944830052166104
0.000018188989763578967
0.000029525788901141325
0.000009613635131744402
0.000005937417234758158
0.000021099748521158368

Notice the zeroes before the first non-zero digit. 注意第一个非零数字之前的零。 These are actually not stored in the number part of a floating point representation, but implied by its exponent part. 这些实际上并未存储在浮点表示的数字部分中,而是由其指数部分隐含。

If you were to take out the leading zeroes, and then make a count: 如果您要取出前导零,然后计算:

var records = {};
var l, r, s;
for (var i=0; i < 1e6; i += 1) {
    r = Math.random();
    s = String(r).replace(/^[0\.]+/, '');
    l = s.length;

    if (records[l]) {
        records[l] += 1;
    } else {
        records[l] = 1;
    }
}

... you'll get results which are less strange. ......你会得到不那么奇怪的结果。

However, you will see some irregularity that is due to how javascript converts tiny numbers to string : when they get too small, the scientific notation is used in the string representation. 但是,您会看到一些不规则性,这是由于javascript如何将小数字转换为string :当它们变得太小时,科学记数法用于string表示。 You can see this with the following script (not sure if every browser has the same breaking point, so maybe you need to play a bit with the number): 您可以使用以下脚本看到这一点(不确定每个浏览器是否都有相同的断点,因此您可能需要使用该数字):

var i = 0.00000123456789012345678;
console.log(String(i), String(i/10));

This gives me the following output: 这给了我以下输出:

0.0000012345678901234567 1.2345678901234568e-7

So very small numbers will get a more fixed string length as a result, quite often 22 characters, while in the non-scientific notation a length of 23 is common. 因此,非常小的数字将获得更固定的string长度,通常是22个字符,而在非科学记数法中,长度为23是常见的。 This influences also the second script I provided and length 22 will get more hits than 23. 这也影响了我提供的第二个脚本,长度22将获得比23更多的命中。

It should be noted that javascript does not switch to scientific notation when converting to string in binary representation: 应该注意的是,当转换为二进制表示形式的string时, javascript不会切换到科学记数法:

var i = 0.1234567890123456789e-120;
console.log(i.toString(2));

The above will print a string of over 450 binary digits! 以上将打印超过450个二进制数字的字符串!

It's because some of the values are like this: 这是因为有些值是这样的:

0.00012345...

And thus they're longer. 因此它们更长。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM