简体   繁体   English

为什么 rand() % N 对于小分布就足够了?

[英]Why is rand() % N sufficient for small distributions?

I've often heard that you should never mod the result of your random number generator if you want a uniform distribution.我经常听说如果你想要一个均匀分布,你不应该修改你的随机数生成器的结果。 However, I've seen that using a std::uniform_int_distribution makes no difference for significantly small ranges.但是,我已经看到使用std::uniform_int_distribution对于非常小的范围没有区别。

Below is an example using both mod and uniform_int_distribution for values 0 - 15:下面是一个使用 mod 和uniform_int_distribution值 0 - 15 的示例:

std::mt19937 gen;
gen.seed(0);

int ROWS = 6;
int COLS = 10;

std::cout << "mod: \n";
for (size_t i = 0; i < ROWS; ++i){
    for (size_t j = 0; j < COLS; ++j){
        std::cout << std::setw(2) << gen() % 16 << " ";
    }
    std::cout << "\n";
}
std::cout << "\n";

gen.seed(0);
std::uniform_int_distribution<> distrib(0, 15);

std::cout << "dist: \n";
for (size_t i = 0; i < ROWS; ++i){
    for (size_t j = 0; j < COLS; ++j){
        std::cout << std::setw(2) << distrib(gen) << " ";
    }
    std::cout << "\n";
}

results:结果:

mod: 
12 15  5  0  3 11  3  7  9  3 
 5  2  4  7  6  8  8 12 10  1 
 6  7  7 14  8  1  5  9 13  8 
 9  4  3  0  3  5 14 15 15  0 
 2  3  8  1  3 13  3  3 14  7 
 0  1  9  9 15  0 15 10  4  7

dist: 
12 15  5  0  3 11  3  7  9  3 
 5  2  4  7  6  8  8 12 10  1 
 6  7  7 14  8  1  5  9 13  8 
 9  4  3  0  3  5 14 15 15  0 
 2  3  8  1  3 13  3  3 14  7 
 0  1  9  9 15  0 15 10  4  7

I guess it has something to do with 2 bytes?我猜这与 2 个字节有关? I'm just wondering how this is valid mathematically since its stepping through the random number generator and modding results.我只是想知道这在数学上如何有效,因为它逐步执行随机数生成器和修改结果。 Does this mean mod creates a uniform distribution if the range is small enough?如果范围足够小,这是否意味着 mod 会创建均匀分布? And why a 2 byte range and not more?为什么是 2 字节范围而不是更多?

Using the modulo operator will frequently introduce a bias into the returned results when the number of unique values returned by your source of random bits is not a multiple of the divisor.当随机位源返回的唯一值的数量不是除数的倍数时,使用模运算符经常会在返回的结果中引入偏差。

As a simple example, if your random source returns 4 bits (0-15) and you want values in the range 0-2, using gen() % N you'll get 6 0 s, 5 1 s, and 5 2 s.举一个简单的例子,如果你的随机源返回 4 位 (0-15) 并且你想要 0-2 范围内的值,使用gen() % N你将得到 6 0 s、5 1 s 和 5 2 s . This biases your results to the low side.这会使您的结果偏低。

Using multiply-then-divide ( gen() * N / RANGE ) can still leave an imbalance in the specific number of each result returned, but the imbalance will be spread out evenly among the results which reduces or eliminates the low bias.使用乘除法 ( gen() * N / RANGE ) 仍然可以在返回的每个结果的特定数量中留下不平衡,但这种不平衡将在结果之间均匀分布,从而减少或消除低偏差。 It also has to contend with overflow in the multiplication.它还必须应对乘法中的溢出。 With the previous example, you'll get 5 0 s, 6 1 s, and 5 0 s.对于前面的示例,您将获得 5 0秒、6 1秒和 5 0秒。

A third alternative would be to check the returned bits to see if the value is among the highest result (that would result in the bias) and regenerate the random bits if this is the case.第三种选择是检查返回的位以查看该值是否在最高结果中(这将导致偏差),如果是这种情况,则重新生成随机位。 This introduces a conditional in the code and the time to generate a random number is open ended (rather than fixed).这在代码中引入了条件,并且生成随机数的时间是开放式的(而不是固定的)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM