简体   繁体   English

2d numpy数组的加权随机抽样

[英]Weighted Random Sampling from 2d numpy array

I have a 2d numpy array Z and I want to randomly choose an index of Z where the chance of an index being chosen is proportional to the value of Z at that index. 我有一个2d numpy数组Z,我想随机选择Z的索引,其中选择索引的机会与该索引处的Z值成比例。

Right now, I'm doing the following: 现在,我正在做以下事情:

yar = list(np.ndenumerate(Z))
x,y = yar[np.random.choice(len(yar), p=Z.ravel()/Z.sum())][0]

Which does the job but feels hideous (and is extremely slow besides). 这项工作起作用但感觉很可怕(而且非常慢)。 Is there a better way? 有没有更好的办法?

We can optimize on avoiding the creation of yar . 我们可以优化避免yar的创建。 We would simply get the linear index equivalent from np.random.choice , convert it to the dimension indices with np.unravel_index to give us x and y . 我们只会看到线性指数相当于从np.random.choice ,将其转换为维度指标与np.unravel_index给我们xy

So, the implementation would be - 所以,实施将是 -

linear_idx = np.random.choice(Z.size, p=Z.ravel()/float(Z.sum()))
x, y = np.unravel_index(linear_idx, Z.shape)

Just to give some context on the numbers by which the creation of yar was causing the bottleneck in that setup, here's a sample timing test - 只是给出一些关于yar创建导致该设置瓶颈的数字的背景信息,这里是一个示例定时测试 -

In [402]: Z = np.random.randint(0,9,(300,400))

In [403]: yar = list(np.ndenumerate(Z))

In [404]: %timeit list(np.ndenumerate(Z))
10 loops, best of 3: 46.3 ms per loop

In [405]: %timeit yar[np.random.choice(len(yar), p=Z.ravel()/float(Z.sum()))][0]
1000 loops, best of 3: 1.34 ms per loop

In [406]: 46.3/(46.3+1.34)
Out[406]: 0.971872376154492

So, creating yar was eating up 97% of the runtime there. 因此,创建yar 占用了 97%的运行时间。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM