I would like to intersect an np.array with a set without having to convert the np.array to a list first (slows down the program to an unworkable level).
Here's my current code: (Note that I'm getting this data from b,g,r rawCapture, and selection_data is simply a set from beforehand.)
def GreenCalculations(data):
data.reshape(1,-1,3)
data={tuple(item) for item in data[0]}
ColourCount=selection_data & set(data)
Return ColourCount
Now my current issue I think is that I'm only comparing the first top part of the picture, due to data[0]. Is it possible to loop through all the rows?
Note: tolist() takes lots of time.
First a sample data
; I'm guessing it's a nxnx3 array, with dtype uint8
In [791]: data=np.random.randint(0,256,(8,8,3),dtype=np.uint8)
reshape
method returns a new array with new shape, but doesn't change that in inplace:
In [793]: data.reshape(1,-1,3)
data.shape=(1,-1,3)
would do that inplace. But why the initial 1
?
Instead:
In [795]: aset={tuple(item) for item in data.reshape(-1,3)}
In [796]: aset
Out[796]:
{(3, 92, 60),
(5, 211, 227),
(6, 185, 183),
(9, 37, 0),
....
In [797]: len(aset)
Out[797]: 64
In my case a set of 64 unique items - not surprising given how I generated the values
Your do-nothing data.reshape
line and {tuple(item) for item in data[0]}
accounts for why it seem to be working on just the 1st row of the picture.
I'm guessing selection_data
is similar 3 item tuples, such as:
In [801]: selection_data = {tuple(data[1,3,:]), (1,2,3), tuple(data[5,5,:])}
In [802]: selection_data
Out[802]: {(1, 2, 3), (49, 132, 26), (76, 131, 16)}
In [803]: selection_data&aset
Out[803]: {(49, 132, 26), (76, 131, 16)}
You don't say where you attempt to use tolist
, but I'm guessing in generating the set of tuples.
But curiously, tolist
speeds up the conversion:
In [808]: timeit {tuple(item) for item in data.reshape(-1,3).tolist()}
10000 loops, best of 3: 57.7 µs per loop
In [809]: timeit {tuple(item) for item in data.reshape(-1,3)}
1000 loops, best of 3: 239 µs per loop
In [815]: timeit data.reshape(-1,3).tolist()
100000 loops, best of 3: 19.8 µs per loop
In [817]: timeit {tuple(item.tolist()) for item in data.reshape(-1,3)}
10000 loops, best of 3: 100 µs per loop
So for doing this sort of list and set operation, we might as well jump to the list format right away.
numpy
has some set functions, for example np.in1d
. That only operations on 1d arrays, but as has been demonstrated in some unique row
questions, we can get around that by viewing the 2d array as a structured array. I had to fiddle around to get this far:
In [880]: dt=np.dtype('uint8,uint8,uint8')
In [881]: data1=data.reshape(-1,3).view(dt).ravel()
In [882]: data1
Out[882]:
array([(41, 145, 254), (138, 144, 7), (192, 241, 203), (42, 177, 215),
(78, 132, 87), (221, 176, 87), (107, 171, 147), (231, 13, 53),
...
dtype=[('f0', 'u1'), ('f1', 'u1'), ('f2', 'u1')])
Construct a selection with the same structured array nature:
In [883]: selection=[data[1,3,:],[1,2,3],data[5,5,:]]
In [885]: selection=np.array(selection,np.uint8).view(dt)
In [886]: selection
Out[886]:
array([[(49, 132, 26)],
[(1, 2, 3)],
[(76, 131, 16)]],
dtype=[('f0', 'u1'), ('f1', 'u1'), ('f2', 'u1')])
So the items in selection
that are also found in data1
are:
In [888]: np.in1d(selection,data1)
Out[888]: array([ True, False, True], dtype=bool)
and the items in data1
that are in selection are:
In [890]: np.where(np.in1d(data1,selection))
Out[890]: (array([11, 45], dtype=int32),)
or in the unraveled shape
In [891]: np.where(np.in1d(data1,selection).reshape(8,8))
Out[891]: (array([1, 5], dtype=int32), array([3, 5], dtype=int32))
the same (1,3) and (5,5) items I used to generate selection
.
The in1d
timings are competitive:
In [892]: %%timeit
...: data1=data.reshape(-1,3).view(dt).ravel()
...: np.in1d(data1,selection)
...:
10000 loops, best of 3: 65.7 µs per loop
In [894]: timeit selection_data&{tuple(item) for item in data.reshape(-1,3).tolist()}
10000 loops, best of 3: 91.5 µs per loop
If I understand your question correctly (and im not 100% sure that i do; but using the same assumptions as hpaulj), your problem can be solved thus using the numpy_indexed package:
import numpy_indexed as npi
ColourCount = npi.intersection(data.reshape(-1, 3), np.asarray(selection_data))
That is, it treats both the reshaped array as well as the set as sequences of length-3 ndarrays, of which it finds the intersection in a vectorized manner.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.