简体   繁体   English

一次获取 NumPy 数组中多个元素的索引

[英]Getting the indices of several elements in a NumPy array at once

Is there any way to get the indices of several elements in a NumPy array at once?有没有办法一次获取 NumPy 数组中多个元素的索引?

Eg例如

import numpy as np
a = np.array([1, 2, 4])
b = np.array([1, 2, 3, 10, 4])

I would like to find the index of each element of a in b , namely: [0,1,4] .我想在b中找到a的每个元素的索引,即: [0,1,4]

I find the solution I am using a bit verbose:我发现我使用的解决方案有点冗长:

import numpy as np

a = np.array([1, 2, 4])
b = np.array([1, 2, 3, 10, 4])

c = np.zeros_like(a)
for i, aa in np.ndenumerate(a):
    c[i] = np.where(b == aa)[0]
    
print('c: {0}'.format(c))

Output:输出:

c: [0 1 4]

You could use in1d and nonzero (or where for that matter):您可以使用in1dnonzero (或where ):

>>> np.in1d(b, a).nonzero()[0]
array([0, 1, 4])

This works fine for your example arrays, but in general the array of returned indices does not honour the order of the values in a .这适用于您的示例数组,但通常返回的索引数组不遵守a中值的顺序。 This may be a problem depending on what you want to do next.这可能是一个问题,具体取决于您接下来要执行的操作。

In that case, a much better answer is the one @Jaime gives here , using searchsorted :在这种情况下,一个更好的答案是@Jaime 在这里给出的答案,使用searchsorted

>>> sorter = np.argsort(b)
>>> sorter[np.searchsorted(b, a, sorter=sorter)]
array([0, 1, 4])

This returns the indices for values as they appear in a .这将返回值的索引,因为它们出现在a中。 For instance:例如:

a = np.array([1, 2, 4])
b = np.array([4, 2, 3, 1])

>>> sorter = np.argsort(b)
>>> sorter[np.searchsorted(b, a, sorter=sorter)]
array([3, 1, 0]) # the other method would return [0, 1, 3]

This is a simple one-liner using the numpy-indexed package (disclaimer: I am its author):这是一个使用numpy-indexed包的简单单行器(免责声明:我是它的作者):

import numpy_indexed as npi
idx = npi.indices(b, a)

The implementation is fully vectorized, and it gives you control over the handling of missing values.该实现是完全矢量化的,它使您可以控制缺失值的处理。 Moreover, it works for nd-arrays as well (for instance, finding the indices of rows of a in b).此外,它也适用于 nd 数组(例如,在 b 中查找 a 的行的索引)。

For an order-agnostic solution, you can use np.flatnonzero with np.isin (v 1.13+).对于与顺序无关的解决方案,您可以将np.flatnonzeronp.isin (v 1.13+)一起使用。

import numpy as np

a = np.array([1, 2, 4])
b = np.array([1, 2, 3, 10, 4])

res = np.flatnonzero(np.isin(a, b))  # NumPy v1.13+
res = np.flatnonzero(np.in1d(a, b))  # earlier versions

# array([0, 1, 2], dtype=int64)

There are a bunch of approaches for getting the index of multiple items at once mentioned in passing in answers to this related question: Is there a NumPy function to return the first index of something in an array?在传递这个相关问题的答案时,有很多方法可以一次获取多个项目的索引: Is there a NumPy function to return the first index of something in an array? . . The wide variety and creativity of the answers suggests there is no single best practice, so if your code above works and is easy to understand, I'd say keep it.答案的多样性和创造性表明没有单一的最佳实践,因此如果您的上述代码有效且易于理解,我会说保留它。

I personally found this approach to be both performant and easy to read: https://stackoverflow.com/a/23994923/3823857我个人发现这种方法既高效又易于阅读: https ://stackoverflow.com/a/23994923/3823857

Adapting it for your example:为您的示例进行调整:

import numpy as np

a = np.array([1, 2, 4])
b_list = [1, 2, 3, 10, 4]
b_array = np.array(b_list)

indices = [b_list.index(x) for x in a]
vals_at_indices = b_array[indices]

I personally like adding a little bit of error handling in case a value in a does not exist in b .我个人喜欢添加一点错误处理,以防b a不存在。

import numpy as np

a = np.array([1, 2, 4])
b_list = [1, 2, 3, 10, 4]
b_array = np.array(b_list)
b_set = set(b_list)

indices = [b_list.index(x) if x in b_set else np.nan for x in a]
vals_at_indices = b_array[indices]

For my use case, it's pretty fast, since it relies on parts of Python that are fast (list comprehensions, .index(), sets, numpy indexing).对于我的用例,它非常快,因为它依赖于快速的 Python 部分(列表推导、.index()、集合、numpy 索引)。 Would still love to see something that's a NumPy equivalent to VLOOKUP, or even a Pandas merge.仍然希望看到与 VLOOKUP 等效的 NumPy 甚至是 Pandas 合并的东西。 But this seems to work for now.但这似乎暂时有效。

All of the solutions here recommend using a linear search.这里的所有解决方案都建议使用线性搜索。 You can use np.argsort and np.searchsorted to speed things up dramatically for large arrays:您可以使用np.argsortnp.searchsorted来显着加快大型数组的速度:

sorter = b.argsort()
i = sorter[np.searchsorted(b, a, sorter=sorter)]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM