numpy随机生成器有偏差吗？

Question

The numpy.random.choice method can generate a random sample without replacement if different elements should have different probabilities. 如果不同的元素应该具有不同的概率，则numpy.random.choice方法可以生成一个随机样本而无需替换。 However, when I test it with 但是，当我用

import numpy

a = [0, 1, 2, 3, 4, 5]
p = [0.1, 0.3, 0.3, 0.1, 0.1, 0.1]
result = [0, 0, 0, 0, 0, 0]
N = 1000000
k = 3

for i in range(0, N):
    temp = numpy.random.choice(a, k, False, p)
    for j in temp:
        result[j] += 1
for i in range(0, 6):
    result[i] /= (N * k)
print(result)

the second and third elements only show up 25% of the time which is off by a lot. 第二个和第三个元素只显示25％的时间，这相差很大。 I tried different probability distributions (eg, [0.1, 0.2, 0.3, 0.1, 0.1, 0.2]) and every time the result didn't match the expectation. 我尝试了不同的概率分布（例如[0.1、0.2、0.3、0.1、0.1、0.2]），并且每次结果都不符合预期时。 Is there an issue with my code or is numpy really that inaccurate? 我的代码有问题吗？还是numpy确实不正确？

Answer 1

Your understanding of the np.random.choice function is wrong. 您对np.random.choice函数的理解是错误的。 Specifically the replace= option. 特别是replace=选项。 The documentation suggests that replace=False means that once an item has been chosen, it can't be chosen again. 该文档建议replace=False表示一旦选择了一项，就不能再次选择它。 This can be shown by running 这可以通过运行来显示

for _ in range(100):
    assert set(np.random.choice(np.arange(5), 5, replace=False)) == set(range(5))

and seeing no error is ever raised. 并没有发现任何错误。 The order changes, but all 5 values must be returned. 顺序更改，但是必须返回所有5个值。

Your current method is giving strange results because of this property. 由于该属性，您当前的方法给出了奇怪的结果。 Even though 1 and 2 have a 0.3 chance of appearing as the first item, they have a less than 0.3 chance of appearing as the second or third item because if they were the first item, they can't be a later item. 即使1和2出现在第一项中的机率是0.3，但它们出现在第二或第三项中的机率却小于0.3，因为如果它们是第一项，那么它们就不能成为后一项。

The solution is obviously to use replace=True (or ignore, True is the default) like so: 解决方案显然是使用replace=True （或忽略，默认为True ），如下所示：

import numpy as np

a = [0, 1, 2, 3, 4, 5]
p = [0.1, 0.3, 0.3, 0.1, 0.1, 0.1]
n = 100_000

choices = np.random.choice(a, n, p=p)
values, counts = np.unique(choices, return_counts=True)
result = dict(zip(values, counts / n))

# result == {0: 0.10063, 1: 0.30018, 2: 0.30003, 3: 0.09916, 4: 0.10109, 5: 0.09891}

numpy随机生成器有偏差吗？

问题描述

1 个解决方案

解决方案1
2 已采纳 2018-07-24 09:40:13

numpy随机生成器有偏差吗？

问题描述

1 个解决方案

解决方案1 2 已采纳 2018-07-24 09:40:13

解决方案1
2 已采纳 2018-07-24 09:40:13