简体   繁体   English

numpy数组中的多个元素的索引

[英]Indices of multiple elements in a numpy array

I have a numpy array and a list as follows 我有一个numpy数组和如下列表

y=np.array([[1],[2],[1],[3],[1],[3],[2],[2]])
x=[1,2,3]

I would like to return a tuple of arrays each of which contains the indices of each element of x in yie 我想返回一个数组的元组,每个数组包含yie中x的每个元素的索引

(array([[0,2,4]]),array([[1,6,7]]),array([[3,5]]))

Is this possible to be done in a vectorized fashion(without any loops)? 是否可以矢量化方式(没有任何循环)完成此操作?

Try the following: 请尝试以下操作:

y = y.flatten()
[np.where(y == searchval)[0] for searchval in x]

One solution is to map 一种解决方案是map

y = y.reshape(1,len(y))
map(lambda k: np.where(y==k)[-1], x)

[array([0, 2, 4]), 
 array([1, 6, 7]), 
 array([3, 5])]

Reasonable performance. 性能合理。 For 100000 rows, 对于100000行,

%timeit list(map(lambda k: np.where(y==k), x))
3.1 ms ± 113 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

For this small example, a dictionary approach is actually faster (then the `wheres): 对于这个小例子,字典方法实际上更快(然后是wheres):

dd = {i:[] for i in [1,2,3]}
for i,v in enumerate(y):
   v=v[0]
   if v in dd:
       dd[v].append(i)
list(dd.values())

This problem has come up in other SO questions. 其他SO问题中也出现了这个问题。 Alternatives using unique and sort have been proposed, but they are more complex and harder to recreate - and not necessarily faster. 已经提出了使用uniquesort替代方案,但是它们更加复杂且难以重新创建-不一定更快。

It's not a ideal problem for numpy . 对于numpy这不是一个理想的问题。 The result is a list of arrays or lists of differing size, which is a pretty good clue that a simple 'vectorized' whole-array solution is not possible. 结果是数组列表或大小不同的列表,这是一个很好的线索,表明不可能使用简单的“矢量化”全数组解决方案。 If speed is an important enough issue you may need to look at numba or cython implementations. 如果速度足够重要,则可能需要查看numbacython实现。

Different methods could have different relative times depending on the mix of values. 根据值的混合,不同的方法可能具有不同的相对时间。 Few unique values, but long sublists might favor methods that use repeated where . 唯一值很少,但是长子列表可能更喜欢使用重复where方法。 Many unique values with short sublists might favor an approach that iterates on y . 带有短子列表的许多唯一值可能会喜欢在y上迭代的方法。

You can use collections.defaultdict followed by a comprehension: 您可以使用collections.defaultdict后跟一个理解:

y = np.array([[1],[2],[1],[3],[1],[3],[2],[2]])
x = [1,2,3]

from collections import defaultdict

d = defaultdict(list)
for idx, item in enumerate(y.flat):
    d[item].append(idx)

res = tuple(np.array(d[k]) for k in x)

(array([0, 2, 4]), array([1, 6, 7]), array([3, 5]))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM