简体   繁体   English

快速找到集合中的Numpy向量

[英]Find numpy vectors in a set quickly

I have a numpy array, for example: 我有一个numpy数组,例如:

a = np.array([[1,2],
              [3,4],
              [6,4],
              [5,3],
              [3,5]])

and I also have a set 我也有一套

b = set((1,2),(6,4),(9,9))

I want to find the index of vectors that exist in set b, here is 我想找到集合b中存在的向量的索引,这是

[0, 2]

but I use a for loop to implement this, is there a convinient way to do this job avoiding for loop? 但是我使用一个for循环来实现这一点,是否有一种简便的方法来完成此工作来避免for循环? The for loop method I used: 我使用的for循环方法:

record = []
for i in range(a.shape[0]):
    if (a[i, 0], a[i, 1]) in b:
        record.append(i)

You can use filter: 您可以使用过滤器:

In [8]: a = np.array([[1,2],
              [3,4],
              [6,4],
              [5,3],
              [3,5]])

In [9]: b = {(1,2),(6,4)}

In [10]: filter(lambda x: tuple(a[x]) in b, range(len(a)))
Out[10]: [0, 2]

First off, convert the set to a NumPy array - 首先,将集合转换为NumPy数组-

b_arr = np.array(list(b))

Then, based on this post , you would have three approaches. 然后,根据this post ,您将有三种方法。 Let's use the second approach for efficiency - 让我们使用第二种方法来提高效率-

dims = np.maximum(a.max(0),b_arr.max(0)) + 1
a1D = np.ravel_multi_index(a.T,dims)
b1D = np.ravel_multi_index(b_arr.T,dims)    
out = np.flatnonzero(np.in1d(a1D,b1D))

Sample run - 样品运行-

In [89]: a
Out[89]: 
array([[1, 2],
       [3, 4],
       [6, 4],
       [5, 3],
       [3, 5]])

In [90]: b
Out[90]: {(1, 2), (6, 4), (9, 9)}

In [91]: b_arr = np.array(list(b))

In [92]: dims = np.maximum(a.max(0),b_arr.max(0)) + 1
    ...: a1D = np.ravel_multi_index(a.T,dims)
    ...: b1D = np.ravel_multi_index(b_arr.T,dims)    
    ...: out = np.flatnonzero(np.in1d(a1D,b1D))
    ...: 

In [93]: out
Out[93]: array([0, 2])

For reference, a straight forward list comprehension (loop) answer: 供参考,简单明了的列表理解(循环)答案:

In [108]: [i for i,v in enumerate(a) if tuple(v) in b]
Out[108]: [0, 2]

basically the same speed as the filter approach: filter方法基本相同的速度:

In [111]: timeit [i for i,v in enumerate(a) if tuple(v) in b]
10000 loops, best of 3: 24.5 µs per loop

In [114]: timeit list(filter(lambda x: tuple(a[x]) in b, range(len(a))))
10000 loops, best of 3: 29.7 µs per loop

But this is a toy example, so timings aren't meaningful. 但这是一个玩具示例,因此计时没有意义。

If a wasn't already an array, these list approaches would be faster than the array ones, due to the overhead of creating arrays. 如果a还不是数组,由于创建数组的开销,这些列表方法将比数组方法快。

There are some numpy set operations, but they work with 1d arrays. 有一些numpy set操作,但它们可用于1d数组。 We can get around that by converting 2d arrays to 1d structured. 我们可以通过将2d数组转换为1d结构化数组来解决此问题。

In [117]: a.view('i,i')
Out[117]: 
array([[(1, 2)],
       [(3, 4)],
       [(6, 4)],
       [(5, 3)],
       [(3, 5)]], 
      dtype=[('f0', '<i4'), ('f1', '<i4')])
In [119]: np.array(list(b),'i,i')
Out[119]: 
array([(1, 2), (6, 4), (9, 9)], 
      dtype=[('f0', '<i4'), ('f1', '<i4')])

There is a version of this using np.void , but it's easier to remember and play with this 'i,i' dtype. 有一个使用np.void的版本,但是使用此'i, np.void易于记忆和使用。

So this works: 所以这工作:

In [123]: np.nonzero(np.in1d(a.view('i,i'),np.array(list(b),'i,i')))[0]
Out[123]: array([0, 2], dtype=int32)

but it is much slower than the iterations: 但是它比迭代慢得多:

In [124]: timeit np.nonzero(np.in1d(a.view('i,i'),np.array(list(b),'i,i')))[0]
10000 loops, best of 3: 153 µs per loop

As discussed in other recent union questions, np.in1d uses several strategies. 如在其他最近的union问题中所讨论的, np.in1d使用几种策略。 One is based on broadcasting and where . 一种是基于广播的, where The other uses unique , concatenation , sorting and differences. 其他使用uniqueconcatenationsorting和差异。

A broadcasting solution (yes, it's messy) - but faster than in1d . 广播解决方案(是的,很麻烦)-但比in1d快。

In [150]: timeit np.nonzero((a[:,:,None,None]==np.array(list(b))[:,:]).any(axis=-1).any(axis=-1).all(axis=-1))[0]
10000 loops, best of 3: 52.2 µs per loop

A one line solution using a list comprehension: 使用列表理解的单行解决方案:

In [62]: a = np.array([[1,2],
    ...:               [3,4],
    ...:               [6,4],
    ...:               [5,3],
    ...:               [3,5]])

In [63]: b = set(((1,2),(6,4),(9,9)))
In [64]: where([tuple(e) in b for e in a])[0]
Out[64]: array([0, 2])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM