简体   繁体   English

根据另一个 numpy 数组中的值查找 numpy 数组的索引

[英]Find indices of numpy array based on values in another numpy array

I want to find the indices in a larger array if they match the values of a different, smaller array.如果索引与另一个较小的数组的值匹配,我想在更大的数组中找到它们。 Something like new_array below:像下面的new_array这样的东西:

import numpy as np
summed_rows = np.random.randint(low=1, high=14, size=9999)
common_sums = np.array([7,10,13])
new_array = np.where(summed_rows == common_sums)

However, this returns:但是,这返回:

__main__:1: DeprecationWarning: elementwise comparison failed; this will raise an error in the future. 
>>>new_array 
(array([], dtype=int64),)

The closest I've gotten is:我得到的最接近的是:

new_array = [np.array(np.where(summed_rows==important_sum)) for important_sum in common_sums[0]]

This gives me a list with three numpy arrays (one for each 'important sum'), but each is a different length which produces further downstream problems with concatenation and vstacking.这给了我一个包含三个 numpy arrays 的列表(每个“重要金额”一个),但每个长度不同,这会产生进一步的串联和 vstacking 下游问题。 To be clear, I do not want to use the line above.明确地说,我不想使用上面的行。 I want to use numpy to index into summed_rows .我想使用 numpy 索引到summed_rows I've looked at various answers using numpy.where , numpy.argwhere , and numpy.intersect1d , but am having trouble putting the ideas together.我已经使用numpy.wherenumpy.argwherenumpy.intersect1d查看了各种答案,但无法将这些想法放在一起。 I figured I'm missing something simple and it would be faster to ask.我想我错过了一些简单的东西,问起来会更快。

Thanks in advance for your recommendations!提前感谢您的建议!

Taking into account the proposed options on the comments, and adding an extra option with numpy's in1d option:考虑到评论中建议的选项,并使用 numpy 的 in1d 选项添加一个额外的选项:

>>> import numpy as np
>>> summed_rows = np.random.randint(low=1, high=14, size=9999)
>>> common_sums = np.array([7,10,13])
>>> ind_1 = (summed_rows==common_sums[:,None]).any(0).nonzero()[0]   # Option of @Brenlla
>>> ind_2 = np.where(summed_rows == common_sums[:, None])[1]   # Option of @Ravi Sharma
>>> ind_3 = np.arange(summed_rows.shape[0])[np.in1d(summed_rows, common_sums)]
>>> ind_4 = np.where(np.in1d(summed_rows, common_sums))[0]
>>> ind_5 = np.where(np.isin(summed_rows, common_sums))[0]   # Option of @jdehesa

>>> np.array_equal(np.sort(ind_1), np.sort(ind_2))
True
>>> np.array_equal(np.sort(ind_1), np.sort(ind_3))
True
>>> np.array_equal(np.sort(ind_1), np.sort(ind_4))
True
>>> np.array_equal(np.sort(ind_1), np.sort(ind_5))
True

If you time it, you can see that all of them are quite similar, but @Brenlla's option is the fastest one如果你计时,你会发现它们都非常相似,但@Brenlla 的选项是最快的

python -m timeit -s 'import numpy as np; np.random.seed(0); a = np.random.randint(low=1, high=14, size=9999); b = np.array([7,10,13])' 'ind_1 = (a==b[:,None]).any(0).nonzero()[0]'
10000 loops, best of 3: 52.7 usec per loop

python -m timeit -s 'import numpy as np; np.random.seed(0); a = np.random.randint(low=1, high=14, size=9999); b = np.array([7,10,13])' 'ind_2 = np.where(a == b[:, None])[1]'
10000 loops, best of 3: 191 usec per loop

python -m timeit -s 'import numpy as np; np.random.seed(0); a = np.random.randint(low=1, high=14, size=9999); b = np.array([7,10,13])' 'ind_3 = np.arange(a.shape[0])[np.in1d(a, b)]'
10000 loops, best of 3: 103 usec per loop

python -m timeit -s 'import numpy as np; np.random.seed(0); a = np.random.randint(low=1, high=14, size=9999); b = np.array([7,10,13])' 'ind_4 = np.where(np.in1d(a, b))[0]'
10000 loops, best of 3: 63 usec per loo

python -m timeit -s 'import numpy as np; np.random.seed(0); a = np.random.randint(low=1, high=14, size=9999); b = np.array([7,10,13])' 'ind_5 = np.where(np.isin(a, b))[0]'
10000 loops, best of 3: 67.1 usec per loop

For anyone loking for this for not equal numbers in the array but nearest equal value, this is a straight forward way to do the same for not exactly equal values.对于任何在数组中寻找不相等数字但最接近相等值的人来说,这是对不完全相等的值执行相同操作的直接方法。 for huge summed_rows, might be memory intensive.对于巨大的 summed_rows,可能是 memory 密集型。

    import numpy  
    summed_rows = np.random.randint(low=1, high=14, size=9999) 
    common_sums = np.array([7,10,13])
    
    repeat_array = np.repeat(summed_rows, len(common_sums)).reshape(len(summed_rows), len(common_sums)) 
    search_index = np.argmin(np.abs(repeat_array - common_sums), axis=0)

Usenp.isin :使用np.isin

import numpy as np
summed_rows = np.random.randint(low=1, high=14, size=9999)
common_sums = np.array([7, 10, 13])
new_array = np.where(np.isin(summed_rows, common_sums))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据另一个0/1索引数组从numpy数组中提取值 - Extract values from a numpy array based on another array of 0/1 indices 根据另一个数组中的值对 numpy 数组的选择索引执行操作 - perform operation on select indices of a numpy array based on values in another array 在未排序的 numpy 数组中查找值列表的索引 - Find indices of a list of values in a not sorted numpy array Numpy 数组以索引为值的字典 - Numpy array to dictionary with indices as values 如何基于另一个具有重复索引的数组获取 numpy 数组中的值总和 - How to get sum of values in a numpy array based on another array with repetitive indices Numpy:从数组获取索引在另一个数组中的值 - Numpy: get values from array where indices are in another array 查找一个numpy数组的N个最大索引,其对应值应大于另一个数组中的M个 - To find N Maximum indices of a numpy array whose corresponding values should greater than M in another array 根据另一个数组中的所有值查找一个数组的最近索引 - Python / NumPy - Find nearest indices for one array against all values in another array - Python / NumPy 如何根据另一个 NumPy 数组的值创建 NumPy 数组? - How to create a NumPy Array based on the values of another NumPy array? 根据第二个数组中的索引重新排列numpy数组 - Reshuffle numpy array based on indices in a second array
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM