Find numpy vectors in a set quickly

Question

I have a numpy array, for example:

a = np.array([[1,2],
              [3,4],
              [6,4],
              [5,3],
              [3,5]])

and I also have a set

b = set((1,2),(6,4),(9,9))

I want to find the index of vectors that exist in set b, here is

[0, 2]

but I use a for loop to implement this, is there a convinient way to do this job avoiding for loop? The for loop method I used:

record = []
for i in range(a.shape[0]):
    if (a[i, 0], a[i, 1]) in b:
        record.append(i)

Answer 1

You can use filter:

In [8]: a = np.array([[1,2],
              [3,4],
              [6,4],
              [5,3],
              [3,5]])

In [9]: b = {(1,2),(6,4)}

In [10]: filter(lambda x: tuple(a[x]) in b, range(len(a)))
Out[10]: [0, 2]

Answer 2

First off, convert the set to a NumPy array -

b_arr = np.array(list(b))

Then, based on this post , you would have three approaches. Let's use the second approach for efficiency -

dims = np.maximum(a.max(0),b_arr.max(0)) + 1
a1D = np.ravel_multi_index(a.T,dims)
b1D = np.ravel_multi_index(b_arr.T,dims)    
out = np.flatnonzero(np.in1d(a1D,b1D))

Sample run -

In [89]: a
Out[89]: 
array([[1, 2],
       [3, 4],
       [6, 4],
       [5, 3],
       [3, 5]])

In [90]: b
Out[90]: {(1, 2), (6, 4), (9, 9)}

In [91]: b_arr = np.array(list(b))

In [92]: dims = np.maximum(a.max(0),b_arr.max(0)) + 1
    ...: a1D = np.ravel_multi_index(a.T,dims)
    ...: b1D = np.ravel_multi_index(b_arr.T,dims)    
    ...: out = np.flatnonzero(np.in1d(a1D,b1D))
    ...: 

In [93]: out
Out[93]: array([0, 2])

Answer 3

For reference, a straight forward list comprehension (loop) answer:

In [108]: [i for i,v in enumerate(a) if tuple(v) in b]
Out[108]: [0, 2]

basically the same speed as the filter approach:

In [111]: timeit [i for i,v in enumerate(a) if tuple(v) in b]
10000 loops, best of 3: 24.5 µs per loop

In [114]: timeit list(filter(lambda x: tuple(a[x]) in b, range(len(a))))
10000 loops, best of 3: 29.7 µs per loop

But this is a toy example, so timings aren't meaningful.

If a wasn't already an array, these list approaches would be faster than the array ones, due to the overhead of creating arrays.

There are some numpy set operations, but they work with 1d arrays. We can get around that by converting 2d arrays to 1d structured.

In [117]: a.view('i,i')
Out[117]: 
array([[(1, 2)],
       [(3, 4)],
       [(6, 4)],
       [(5, 3)],
       [(3, 5)]], 
      dtype=[('f0', '<i4'), ('f1', '<i4')])
In [119]: np.array(list(b),'i,i')
Out[119]: 
array([(1, 2), (6, 4), (9, 9)], 
      dtype=[('f0', '<i4'), ('f1', '<i4')])

There is a version of this using np.void , but it's easier to remember and play with this 'i,i' dtype.

So this works:

In [123]: np.nonzero(np.in1d(a.view('i,i'),np.array(list(b),'i,i')))[0]
Out[123]: array([0, 2], dtype=int32)

but it is much slower than the iterations:

In [124]: timeit np.nonzero(np.in1d(a.view('i,i'),np.array(list(b),'i,i')))[0]
10000 loops, best of 3: 153 µs per loop

As discussed in other recent union questions, np.in1d uses several strategies. One is based on broadcasting and where . The other uses unique , concatenation , sorting and differences.

A broadcasting solution (yes, it's messy) - but faster than in1d .

In [150]: timeit np.nonzero((a[:,:,None,None]==np.array(list(b))[:,:]).any(axis=-1).any(axis=-1).all(axis=-1))[0]
10000 loops, best of 3: 52.2 µs per loop

Answer 4

A one line solution using a list comprehension:

In [62]: a = np.array([[1,2],
    ...:               [3,4],
    ...:               [6,4],
    ...:               [5,3],
    ...:               [3,5]])

In [63]: b = set(((1,2),(6,4),(9,9)))
In [64]: where([tuple(e) in b for e in a])[0]
Out[64]: array([0, 2])

Find numpy vectors in a set quickly

Question

4 answers

solution1
1 2016-08-30 04:35:08

solution2
1 2016-08-30 04:58:33

solution3
0 2016-08-30 06:45:20

solution4
0 2016-08-30 08:12:46

Find numpy vectors in a set quickly

Question

4 answers

solution1 1 2016-08-30 04:35:08

solution2 1 2016-08-30 04:58:33

solution3 0 2016-08-30 06:45:20

solution4 0 2016-08-30 08:12:46

solution1
1 2016-08-30 04:35:08

solution2
1 2016-08-30 04:58:33

solution3
0 2016-08-30 06:45:20

solution4
0 2016-08-30 08:12:46