[英]np.random.choice: probabilities do not sum to 1
how can I use np.random.choice here?我如何在这里使用 np.random.choice ? there is p
that calculate by some opertation, like :有通过某种操作计算的p
,例如:
p=[ 1.42836755e-01, 1.42836735e-01 , 1.42836735e-01, 1.42836735e-01
, 4.76122449e-05, 1.42836735e-01 , 4.76122449e-05 , 1.42836735e-01,
1.42836735e-01, 4.76122449e-05]
usually sum p is not exact equal to 1:通常总和 p 不完全等于 1:
>>> sum(p)
1.0000000017347
I want to make random choice by probabilities=p:我想通过概率 = p 进行随机选择:
>>> np.random.choice([1,2,3,4,5,6,7,8,9, 10], 4, p=p, replace=False)
array([4, 3, 2, 9])
this work here!这里的工作! but in the program it has an error :但在程序中它有一个错误:
Traceback (most recent call last):
indexs=np.random.choice(range(len(population)), population_number, p=p, replace=False)
File "mtrand.pyx", line 1141, in mtrand.RandomState.choice (numpy/random/mtrand/mtrand.c:17808)
ValueError: probabilities do not sum to 1
if I print the p
:如果我打印p
:
[ 4.17187500e-05 2.49937500e-01 4.16562500e-05 4.16562500e-05
2.49937500e-01 4.16562500e-05 4.16562500e-05 4.16562500e-05
2.49937500e-01 2.49937500e-01]
but it works, in python shell by this p
:但它可以通过这个p
在 python shell 中工作:
>>> p=[ 4.17187500e-05 , 2.49937500e-01 ,4.16562500e-05 , 4.16562500e-05,
2.49937500e-01 , 4.16562500e-05 , 4.16562500e-05 , 4.16562500e-05,
2.49937500e-01 ,2.49937500e-01]
>>> np.random.choice([1,2,3,4,5,6,7,8,9, 10], 4, p=p, replace=False)
array([ 9, 10, 2, 5])
UPDATE I have tested it by precision=15:更新我已经通过 precision=15 对其进行了测试:
np.set_printoptions(precision=15)
print(p)
[ 2.499375625000002e-01 2.499375000000000e-01 2.499375000000000e-01
4.165625000000000e-05 4.165625000000000e-05 4.165625000000000e-05
4.165625000000000e-05 4.165625000000000e-05 2.499375000000000e-01
4.165625000000000e-05]
testing:测试:
>>> p=np.array([ 2.499375625000002e-01 ,2.499375000000000e-01 ,2.499375000000000e-01,
4.165625000000000e-05 ,4.165625000000000e-05, 4.165625000000000e-05,
4.165625000000000e-05 , 4.165625000000000e-05 , 2.499375000000000e-01,
4.165625000000000e-05])
>>> np.sum(p)
1.0000000000000002
how fix this to use np.random.choice ?如何解决这个问题以使用 np.random.choice ?
This is a known issue with numpy.这是 numpy 的一个已知问题。 The random choice function checks for the sum of the probabilities using a given tolerance ( here the source )随机选择函数使用给定的容差( 这里是源)检查概率的总和
The solution is to normalize the probabilities by dividing them by their sum if the sum is close enough to 1如果总和足够接近 1,解决方案是通过将概率除以它们的总和来标准化概率
Example:示例:
>>> p=[ 1.42836755e-01, 1.42836735e-01 , 1.42836735e-01, 1.42836735e-01
, 4.76122449e-05, 1.42836735e-01 , 4.76122449e-05 , 1.42836735e-01,
1.42836735e-01, 4.79122449e-05]
>>> sum(p)
1.0000003017347 # over tolerance limit
>>> np.random.choice([1,2,3,4,5,6,7,8,9, 10], 4, p=p, replace=False)
Traceback (most recent call last):
File "<pyshell#23>", line 1, in <module>
np.random.choice([1,2,3,4,5,6,7,8,9, 10], 4, p=p, replace=False)
File "mtrand.pyx", line 1417, in mtrand.RandomState.choice (numpy\random\mtrand\mtrand.c:15985)
ValueError: probabilities do not sum to 1
With normalization:归一化:
>>> p = np.array(p)
>>> p /= p.sum() # normalize
>>> np.random.choice([1,2,3,4,5,6,7,8,9, 10], 4, p=p, replace=False)
array([8, 4, 1, 6])
One way to see the difference is:查看差异的一种方法是:
numpy.set_printoptions(precision=15)
print(p)
This will perhaps show you that your 4.17187500e-05
is actually 4.17187500005e-05
.这可能会告诉你你的4.17187500e-05
实际上是4.17187500005e-05
。 See the manual here .请参阅此处的手册。
Convert it to float64:将其转换为 float64:
p = np.asarray(p).astype('float64')
p = p / np.sum(p)
np.random.choice([1,2,3,4,5,6,7,8,9, 10], 4, p=p, replace=False)
This was inspired by another post: How can I avoid value errors when using numpy.random.multinomial?这是受到另一篇文章的启发: 使用 numpy.random.multinomial 时如何避免值错误?
ValueError: probabilities do not sum to 1 ValueError:概率总和不为 1
This is a known numpy bug.这是一个已知的 numpy 错误。 This error happens when numpy can't handle float operations precise enough.当 numpy 无法足够精确地处理浮点操作时会发生此错误。 Sometimes, probabilities will sum to something like 0.9999999999997 or 1.0000000000003.有时,概率的总和等于 0.9999999999997 或 1.0000000000003。 They will break np.random.choice().他们会破坏 np.random.choice()。
There is a workaround: np.random.multinomial() .有一个解决方法: np.random.multinomial() 。 This method handles probabilities more elegantly without the need to be exactly 1.0.这种方法可以更优雅地处理概率,而无需精确到 1.0。
pvals : sequence of floats, length p Probabilities of each of the p different outcomes. pvals :浮点数序列,长度为 p 每种不同结果的概率。 These should sum to 1 (however, the last element is always assumed to account for the remaining probability, as long as sum(pvals[:-1]) <= 1).这些总和应为 1(但是,只要 sum(pvals[:-1]) <= 1,则始终假定最后一个元素考虑剩余概率)。
For example, I have some choices and normalized_weights associated with the choices.例如,我有一些选择和与选择相关的 normalized_weights。
np.random.multinomial() choose 20 times based on the normalized_weights and returns how many times each choice is chosen. np.random.multinomial() 根据 normalized_weights 选择 20 次并返回每个选择被选择的次数。
choices = [......]
weights = np.array([......])
normalized_weights = weights / np.sum(weights)
number_of_choices = 20
resample_counts = np.random.multinomial(number_of_choices,
normalized_weights)
chosen = []
resample_index = 0
for resample_count in resample_counts:
for _ in range(resample_count):
chosen.append(choices[resample_index])
resample_index += 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.