[英]Find indices in each column of a 2d array for every element in a 1d array without looping
我正在尝试对比赛进行蒙特卡洛模拟,并根据模拟的每次迭代的排名分配积分。 我目前有一个使用 Numpy 的 argwhere 的工作解决方案,但是对于大型比赛规模和模拟(例如 25000 名参赛者和 10000 次模拟),由于列表理解,脚本非常慢。
import numpy as np
#sample arrays, the actual sizes are arbitrary based on the number of contestants (ranks.shape[1]) and simulations (ranks.shape[0]) but len(field) is always equal to number of contestants and len(payout_array)
#points to be allocated based on finish position
points_array = np.array([100,50,0,0,0])
#standings for the first simulation = [2,4,3,1,0], the second = [4,0,1,2,3]
ranks = np.array([[2, 4, 4, 1],
[4, 0, 0, 4],
[3, 1, 1, 0],
[1, 2, 3, 2],
[0, 3, 2, 3]])
field = np.arange(5)
ind = [np.argwhere(ranks==i) for i in field]
roi = [np.sum(points_array[i[:,0]]) for i in ind]
print(roi)
回报:[100, 100, 100, 0, 300]
此解决方案有效并且对于小型 arrays 非常快:
%timeit [np.argwhere(ranks==i) for i in field]
36.2 µs ± 1.08 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
然而,对于大型参赛者场地和许多模拟,脚本使用 argwhere 挂在列表理解上(对于 30k 人的场地和 10k 模拟,在我的机器上没有 memory 约束,20 分钟后仍然运行)。 有没有办法对 argwhere 进行矢量化或降低查找的复杂性,以帮助加快查找字段中所有元素的排名索引?
您可以使用numpy.argsort
来向量化此行为。 这是有效的,因为你的最终数组是按照参赛者的顺序排列的,你需要根据rank
数组找到他们完成 position 的索引。
import numpy as np
#sample arrays, the actual sizes are arbitrary based on the number of contestants (ranks.shape[1]) and simulations (ranks.shape[0]) but len(field) is always equal to number of contestants and len(payout_array)
#points to be allocated based on finish position
points_array = np.array([100,50,0,0,0])
#standings for the first simulation = [2,4,3,1,0], the second = [4,0,1,2,3]
ranks = np.array([[2, 4, 4, 1],
[4, 0, 0, 4],
[3, 1, 1, 0],
[1, 2, 3, 2],
[0, 3, 2, 3]])
roi = points_array[np.argsort(ranks, axis=0)].sum(axis=1)
print(roi)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.