python 2的高性能加权随机选择？

Question

I have the following python method, which selects a weighted random element from the sequence "seq" randomly weighted by other sequence, which contains the weights for each element in seq: 我有以下python方法，该方法从序列“ seq”中选择一个加权随机元素，该序列被其他序列随机加权，其中包含seq中每个元素的权重：

def weighted_choice(seq, weights):
    assert len(seq) == len(weights)

    total = sum(weights)
    r = random.uniform(0, total)
    upto = 0
    for i in range(len(seq)):
        if upto + weights[i] >= r:
            return seq[i]
        upto += weights[i]
    assert False, "Shouldn't get here"

If I call the above a million times with a 1000 element sequence, like this: 如果我用1000个元素序列调用上述方法一百万次，如下所示：

seq = range(1000)
weights = []
for i in range(1000):
    weights.append(random.randint(1,100))

st=time.time()
for i in range(1000000):
    r=weighted_choice(seq, weights)
print (time.time()-st)

it runs for approximately 45 seconds in cpython 2.7 and for 70 seconds in cpython 3.6. 它在cpython 2.7中运行大约45秒，在cpython 3.6中运行70秒。 It finishes in around 2.3 seconds in pypy 5.10, which would be fine for me, sadly I can't use pypy for some reasons. 在pypy 5.10中，它大约需要2.3秒才能完成，对我来说这很好，但是由于某些原因，我无法使用pypy。

Any ideas on how to speed up this function on cpython? 关于如何在cpython上加快此功能的任何想法？ I'm interested in other implementations (algorithmically, or via external libraries, like numpy) as well if they perform better. 如果它们的性能更好，我也会对其他实现（通过算法或通过外部库，例如numpy）感兴趣。

ps: python3 has random.choices with weights, it runs for around 23 seconds, which is better than the above function, but still exactly ten times slower than pypy can run. ps：python3有权重的random.choices，它运行约23秒，比上面的函数要好，但仍然比pypy慢十倍。

I've tried it with numpy this way: 我已经用numpy尝试过这种方式：

weights=[1./1000]*1000
st=time.time()
for i in range(1000000):
    #r=weighted_choice(seq, weights)
    #r=random.choices(seq, weights)
    r=numpy.random.choice(seq, p=weights)
print (time.time()-st)

It ran for 70 seconds. 它运行了70秒。

Answer 1

You can use numpy.random.choice (the p parameter is the weights). 您可以使用numpy.random.choice （ p参数是权重）。 Normally numpy functions are vectorized and so run at near-C speed. 通常， numpy函数是矢量化的，因此以接近C的速度运行。

Implement as: 实施为：

def weighted_choice(seq, weights):
    w = np.asarray(weights)
    p = w / w.sum()  # can skip if weights always sum to 1
    return np.random.choice(seq, p=w)

Edit: 编辑：

Timings: 时间：

%timeit np.random.choice(x, p=w)  # len(x) == 1_000_000
13 ms ± 238 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit np.random.choice(y, p=w)  # len(y) == 100_000_000
1.28 s ± 18.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Answer 2

you could take this approach with numpy . 您可以使用numpy采取这种方法。 If you emlimiate the for loop, you can get the true power of numpy by indexing the positions you need 如果您对for循环进行了优化，则可以通过索引所需的位置来获得numpy的真正功能

#Untimed since you did not
seq = np.arange(1000)
weights = np.random.randint(1,100,(1000,1))


def weights_numpy(seq,weights,iterations):
    """
    :param seq: Input sequence
    :param weights: Input Weights
    :param iterations: Iterations to run
    :return: 
    """
    r = np.random.uniform(0,weights.sum(0),(1,iterations)) #create array of choices
    ar = weights.cumsum(0) # get cumulative sum
    return seq[(ar >= r).argmax(0)] #get indeces of seq that meet your condition

And the timing (python 3,numpy '1.14.0' ) 和计时（python 3， '1.14.0' ）

%timeit weights_numpy(seq,weights,1000000)
4.05 s ± 256 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Which is a bit slower than PyPy, but hardly... 比PyPy慢一点，但是几乎没有...

python 2的高性能加权随机选择？

问题描述

2 个解决方案

解决方案1
2 2018-03-08 14:30:50

解决方案2
0 已采纳 2018-03-08 17:10:33

python 2的高性能加权随机选择？

问题描述

2 个解决方案

解决方案1 2 2018-03-08 14:30:50

解决方案2 0 已采纳 2018-03-08 17:10:33

解决方案1
2 2018-03-08 14:30:50

解决方案2
0 已采纳 2018-03-08 17:10:33