简体   繁体   English

Numpy随机选择分配错误

[英]Numpy random choice distribution error

I have a list of numbers and another list of probabilities which corresponds to these numbers. 我有一个数字列表和另一个与这些数字相对应的概率列表。 I use numpy.random.choice to create a random 2d array: 我使用numpy.random.choice来创建一个随机的2d数组:

choice = numpy.random.choice([10, 22, 30], [10, 10], p=[0.45, 0.45, 0.10])

In choice should be 45 ones, 45 twos and 10 zeros but after several runs i never get the correct distribution. choice应该是45个,45两个和10个零,但经过几次运行后我永远得不到正确的分布。

unique, counts = numpy.unique(choice, return_counts=True)
print(dict(zip(unique, counts)))

{10: 49, 22: 37, 30: 14}
{10: 47, 22: 42, 30: 11}
{10: 40, 22: 51, 30: 9}

What did i miss? 我错过了什么?

You are completely missing how sampling from a distribution works in practice. 您完全不知道分销中的抽样是如何在实践中起作用的。 You never "get" the correct distribution, you always get an approximation to it, because you are sampling. 你永远不会“得到”正确的分布,你总是会得到它的近似值,因为你正在抽样。

Only in the case where the number of samples is very large you should eventually converge to the target distribution. 只有在样本数量非常大的情况下,才应最终收敛到目标分布。 But since sampling is a stochastic process, there is always randomness on the results of the process. 但由于抽样是一个随机过程,因此过程结果始终存在随机性。

And this of course applies to generating numbers with a (pseudo-)random number generator. 这当然适用于使用(伪)随机数生成器生成数字。

So if you flipped a coin a thousand times, you'd expect to always get exactly 500 heads? 所以,如果你翻了一千次硬币,你会期望总是得到500个头?

If you want to control the exact count of each result, you can't rely on probabilities - instead, chose (without replacement) from a list in which each result is present with the desired multiplicity: 如果要控制每个结果的确切计数,则不能依赖概率 - 而是从列表中选择(不替换),其中每个结果都具有所需的多重性:

numpy.random.choice([10] * 45 + [22] * 45 + [30] * 10, [10, 10], replace=False)

What Matias said is true. Matias说的是真的。

If you do want to create an array with exactly 45 zeros, 45 ones, and 10 twos, with a shape of (10, 10) but in a random order, you can do something like this: 如果你想创建一个正好有45个零,45个和10个二进制数组,形状为(10,10)但是按随机顺序排列的数组,你可以这样做:

import numpy as np
zeros = np.array([0]*45)
ones = np.array([1]*45)
twos = np.array([2]*10)
myarr = np.concatenate([zeros, ones, twos])

# Random permutation, followed by reshaping in (10, 10) form
choice = np.random.permutation(myarr).reshape(10,10)
unique, counts = np.unique(choice, return_counts=True)
print(dict(zip(unique, counts)))
{0: 45, 1: 45, 2: 10}

The sampling won't be exact, you could force all numbers to be in the output array by making a list of all numbers you want and then randomly shuffling it: 采样不准确,你可以通过列出你想要的所有数字然后随机改组它来强制所有数字都在输出数组中:

import numpy
import numpy.random

numbers = numpy.asarray(45*[10]+45*[22]+10*[30])
print (numbers)
numpy.random.shuffle(numbers) # numbers is changed in place
choice = numbers.reshape((10,10))

print (choice)
unique, counts = numpy.unique(choice, return_counts=True)
print(dict(zip(unique, counts)))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM