简体   繁体   中英

intersection of np.array and set

I would like to intersect an np.array with a set without having to convert the np.array to a list first (slows down the program to an unworkable level).

Here's my current code: (Note that I'm getting this data from b,g,r rawCapture, and selection_data is simply a set from beforehand.)

def GreenCalculations(data):
    data.reshape(1,-1,3)
    data={tuple(item) for item in data[0]}
    ColourCount=selection_data & set(data)
    Return ColourCount

Now my current issue I think is that I'm only comparing the first top part of the picture, due to data[0]. Is it possible to loop through all the rows?

Note: tolist() takes lots of time.

First a sample data ; I'm guessing it's a nxnx3 array, with dtype uint8

In [791]: data=np.random.randint(0,256,(8,8,3),dtype=np.uint8)

reshape method returns a new array with new shape, but doesn't change that in inplace:

In [793]: data.reshape(1,-1,3)

data.shape=(1,-1,3) would do that inplace. But why the initial 1 ?

Instead:

In [795]: aset={tuple(item) for item in data.reshape(-1,3)}
In [796]: aset
Out[796]: 
{(3, 92, 60),
 (5, 211, 227),
 (6, 185, 183),
 (9, 37, 0),
 ....

 In [797]: len(aset)
 Out[797]: 64

In my case a set of 64 unique items - not surprising given how I generated the values

Your do-nothing data.reshape line and {tuple(item) for item in data[0]} accounts for why it seem to be working on just the 1st row of the picture.

I'm guessing selection_data is similar 3 item tuples, such as:

In [801]: selection_data = {tuple(data[1,3,:]), (1,2,3), tuple(data[5,5,:])}
In [802]: selection_data
Out[802]: {(1, 2, 3), (49, 132, 26), (76, 131, 16)}
In [803]: selection_data&aset
Out[803]: {(49, 132, 26), (76, 131, 16)}

You don't say where you attempt to use tolist , but I'm guessing in generating the set of tuples.

But curiously, tolist speeds up the conversion:

In [808]: timeit {tuple(item) for item in data.reshape(-1,3).tolist()}
10000 loops, best of 3: 57.7 µs per loop
In [809]: timeit {tuple(item) for item in data.reshape(-1,3)}
1000 loops, best of 3: 239 µs per loop
In [815]: timeit data.reshape(-1,3).tolist()
100000 loops, best of 3: 19.8 µs per loop
In [817]: timeit {tuple(item.tolist()) for item in data.reshape(-1,3)}
10000 loops, best of 3: 100 µs per loop

So for doing this sort of list and set operation, we might as well jump to the list format right away.

numpy has some set functions, for example np.in1d . That only operations on 1d arrays, but as has been demonstrated in some unique row questions, we can get around that by viewing the 2d array as a structured array. I had to fiddle around to get this far:

In [880]: dt=np.dtype('uint8,uint8,uint8')
In [881]: data1=data.reshape(-1,3).view(dt).ravel()
In [882]: data1
Out[882]: 
array([(41, 145, 254), (138, 144, 7), (192, 241, 203), (42, 177, 215),
       (78, 132, 87), (221, 176, 87), (107, 171, 147), (231, 13, 53),
       ... 
      dtype=[('f0', 'u1'), ('f1', 'u1'), ('f2', 'u1')])

Construct a selection with the same structured array nature:

In [883]: selection=[data[1,3,:],[1,2,3],data[5,5,:]]
In [885]: selection=np.array(selection,np.uint8).view(dt)
In [886]: selection
Out[886]: 
array([[(49, 132, 26)],
       [(1, 2, 3)],
       [(76, 131, 16)]], 
      dtype=[('f0', 'u1'), ('f1', 'u1'), ('f2', 'u1')])

So the items in selection that are also found in data1 are:

In [888]: np.in1d(selection,data1)
Out[888]: array([ True, False,  True], dtype=bool)

and the items in data1 that are in selection are:

In [890]: np.where(np.in1d(data1,selection))
Out[890]: (array([11, 45], dtype=int32),)

or in the unraveled shape

In [891]: np.where(np.in1d(data1,selection).reshape(8,8))
Out[891]: (array([1, 5], dtype=int32), array([3, 5], dtype=int32))

the same (1,3) and (5,5) items I used to generate selection .

The in1d timings are competitive:

In [892]: %%timeit
     ...: data1=data.reshape(-1,3).view(dt).ravel()
     ...: np.in1d(data1,selection)
     ...: 
10000 loops, best of 3: 65.7 µs per loop

In [894]: timeit selection_data&{tuple(item) for item in data.reshape(-1,3).tolist()}
10000 loops, best of 3: 91.5 µs per loop

If I understand your question correctly (and im not 100% sure that i do; but using the same assumptions as hpaulj), your problem can be solved thus using the numpy_indexed package:

import numpy_indexed as npi
ColourCount = npi.intersection(data.reshape(-1, 3), np.asarray(selection_data))

That is, it treats both the reshaped array as well as the set as sequences of length-3 ndarrays, of which it finds the intersection in a vectorized manner.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM