简体   繁体   English

在不循环的情况下为一维数组中的每个元素在二维数组的每一列中查找索引

[英]Find indices in each column of a 2d array for every element in a 1d array without looping

I'm trying to do a Monte Carlo simulation of a contest and allocate points based on standings for each iteration of the sim.我正在尝试对比赛进行蒙特卡洛模拟,并根据模拟的每次迭代的排名分配积分。 I currently have a working solution using Numpy's argwhere, however for large contest sizes and simulations (eg 25000 contestants and 10000 simulations) the script is extremely slow due to list comprehensions.我目前有一个使用 Numpy 的 argwhere 的工作解决方案,但是对于大型比赛规模和模拟(例如 25000 名参赛者和 10000 次模拟),由于列表理解,脚本非常慢。

import numpy as np 

#sample arrays, the actual sizes are arbitrary based on the number of contestants (ranks.shape[1]) and simulations (ranks.shape[0]) but len(field) is always equal to number of contestants and len(payout_array)

#points to be allocated based on finish position
points_array = np.array([100,50,0,0,0])

#standings for the first simulation = [2,4,3,1,0], the second = [4,0,1,2,3]
ranks = np.array([[2, 4, 4, 1],
 [4, 0, 0, 4],
 [3, 1, 1, 0],
 [1, 2, 3, 2],
 [0, 3, 2, 3]])

field = np.arange(5)

ind = [np.argwhere(ranks==i) for i in field]
roi = [np.sum(points_array[i[:,0]]) for i in ind]
print(roi)

returns: [100, 100, 100, 0, 300]回报:[100, 100, 100, 0, 300]

This solution works and is very fast for small arrays:此解决方案有效并且对于小型 arrays 非常快:

%timeit [np.argwhere(ranks==i) for i in field]
36.2 µs ± 1.08 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

However for large contestant fields and many simulations the script hangs on the list comprehension using argwhere (still running after 20 minutes for a 30k person field and 10k simulations with no memory constraints on my machine).然而,对于大型参赛者场地和许多模拟,脚本使用 argwhere 挂在列表理解上(对于 30k 人的场地和 10k 模拟,在我的机器上没有 memory 约束,20 分钟后仍然运行)。 Is there a way to vectorize argwhere or reduce the complexity of the lookup to help speed up finding the indices in ranks for all of the elements in field?有没有办法对 argwhere 进行矢量化或降低查找的复杂性,以帮助加快查找字段中所有元素的排名索引?

You can use numpy.argsort to vectorize this behavior.您可以使用numpy.argsort来向量化此行为。 This works because your final array is in order of the contestants, and you need to find the index of their finish position based on the rank array.这是有效的,因为你的最终数组是按照参赛者的顺序排列的,你需要根据rank数组找到他们完成 position 的索引。

import numpy as np

#sample arrays, the actual sizes are arbitrary based on the number of contestants (ranks.shape[1]) and simulations (ranks.shape[0]) but len(field) is always equal to number of contestants and len(payout_array)

#points to be allocated based on finish position
points_array = np.array([100,50,0,0,0])

#standings for the first simulation = [2,4,3,1,0], the second = [4,0,1,2,3]
ranks = np.array([[2, 4, 4, 1],
 [4, 0, 0, 4],
 [3, 1, 1, 0],
 [1, 2, 3, 2],
 [0, 3, 2, 3]])

roi = points_array[np.argsort(ranks, axis=0)].sum(axis=1)
print(roi)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM