[英]Find indices of numpy array based on values in another numpy array
I want to find the indices in a larger array if they match the values of a different, smaller array.如果索引与另一个较小的数组的值匹配,我想在更大的数组中找到它们。 Something like new_array
below:像下面的new_array
这样的东西:
import numpy as np
summed_rows = np.random.randint(low=1, high=14, size=9999)
common_sums = np.array([7,10,13])
new_array = np.where(summed_rows == common_sums)
However, this returns:但是,这返回:
__main__:1: DeprecationWarning: elementwise comparison failed; this will raise an error in the future.
>>>new_array
(array([], dtype=int64),)
The closest I've gotten is:我得到的最接近的是:
new_array = [np.array(np.where(summed_rows==important_sum)) for important_sum in common_sums[0]]
This gives me a list with three numpy arrays (one for each 'important sum'), but each is a different length which produces further downstream problems with concatenation and vstacking.这给了我一个包含三个 numpy arrays 的列表(每个“重要金额”一个),但每个长度不同,这会产生进一步的串联和 vstacking 下游问题。 To be clear, I do not want to use the line above.明确地说,我不想使用上面的行。 I want to use numpy to index into summed_rows
.我想使用 numpy 索引到summed_rows
。 I've looked at various answers using numpy.where
, numpy.argwhere
, and numpy.intersect1d
, but am having trouble putting the ideas together.我已经使用numpy.where
、 numpy.argwhere
和numpy.intersect1d
查看了各种答案,但无法将这些想法放在一起。 I figured I'm missing something simple and it would be faster to ask.我想我错过了一些简单的东西,问起来会更快。
Thanks in advance for your recommendations!提前感谢您的建议!
Taking into account the proposed options on the comments, and adding an extra option with numpy's in1d option:考虑到评论中建议的选项,并使用 numpy 的 in1d 选项添加一个额外的选项:
>>> import numpy as np
>>> summed_rows = np.random.randint(low=1, high=14, size=9999)
>>> common_sums = np.array([7,10,13])
>>> ind_1 = (summed_rows==common_sums[:,None]).any(0).nonzero()[0] # Option of @Brenlla
>>> ind_2 = np.where(summed_rows == common_sums[:, None])[1] # Option of @Ravi Sharma
>>> ind_3 = np.arange(summed_rows.shape[0])[np.in1d(summed_rows, common_sums)]
>>> ind_4 = np.where(np.in1d(summed_rows, common_sums))[0]
>>> ind_5 = np.where(np.isin(summed_rows, common_sums))[0] # Option of @jdehesa
>>> np.array_equal(np.sort(ind_1), np.sort(ind_2))
True
>>> np.array_equal(np.sort(ind_1), np.sort(ind_3))
True
>>> np.array_equal(np.sort(ind_1), np.sort(ind_4))
True
>>> np.array_equal(np.sort(ind_1), np.sort(ind_5))
True
If you time it, you can see that all of them are quite similar, but @Brenlla's option is the fastest one如果你计时,你会发现它们都非常相似,但@Brenlla 的选项是最快的
python -m timeit -s 'import numpy as np; np.random.seed(0); a = np.random.randint(low=1, high=14, size=9999); b = np.array([7,10,13])' 'ind_1 = (a==b[:,None]).any(0).nonzero()[0]'
10000 loops, best of 3: 52.7 usec per loop
python -m timeit -s 'import numpy as np; np.random.seed(0); a = np.random.randint(low=1, high=14, size=9999); b = np.array([7,10,13])' 'ind_2 = np.where(a == b[:, None])[1]'
10000 loops, best of 3: 191 usec per loop
python -m timeit -s 'import numpy as np; np.random.seed(0); a = np.random.randint(low=1, high=14, size=9999); b = np.array([7,10,13])' 'ind_3 = np.arange(a.shape[0])[np.in1d(a, b)]'
10000 loops, best of 3: 103 usec per loop
python -m timeit -s 'import numpy as np; np.random.seed(0); a = np.random.randint(low=1, high=14, size=9999); b = np.array([7,10,13])' 'ind_4 = np.where(np.in1d(a, b))[0]'
10000 loops, best of 3: 63 usec per loo
python -m timeit -s 'import numpy as np; np.random.seed(0); a = np.random.randint(low=1, high=14, size=9999); b = np.array([7,10,13])' 'ind_5 = np.where(np.isin(a, b))[0]'
10000 loops, best of 3: 67.1 usec per loop
For anyone loking for this for not equal numbers in the array but nearest equal value, this is a straight forward way to do the same for not exactly equal values.对于任何在数组中寻找不相等数字但最接近相等值的人来说,这是对不完全相等的值执行相同操作的直接方法。 for huge summed_rows, might be memory intensive.对于巨大的 summed_rows,可能是 memory 密集型。
import numpy
summed_rows = np.random.randint(low=1, high=14, size=9999)
common_sums = np.array([7,10,13])
repeat_array = np.repeat(summed_rows, len(common_sums)).reshape(len(summed_rows), len(common_sums))
search_index = np.argmin(np.abs(repeat_array - common_sums), axis=0)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.