Say I have two numpy arrays of lists:
r = np.array([[1,2,3],[1,2,3],[4,5]])
q= np.array([[1,2,3],[1,2,3]])
and I use numpy.unique
to trim them down to just the unique lists
np.unique(r)
array([[1, 2, 3], [4, 5]], dtype=object)
np.unique(q)
array([1, 2, 3])
You can see that the output from np.unique(q)
has a different depth.
My question is; how can I keep a consistent "depth" for both of the above examples?
ie so that np.unique(q)
gives:
array([[1, 2, 3]], dtype=object)
This has to do with the way q
is constructed. When you say
q = np.array([[1,2,3],[1,2,3]])
You are creating an array of 6 items:
In [77]: q.size
Out[77]: 6
When you apply np.unique
to this array, each of the 6 items is considered a separate value, and the unique ones are returned.
If instead you create a q
with just 2 items which are Python lists:
In [78]: q = np.empty(2, dtype='object')
In [79]: q[:] = [[1,2,3],[1,2,3]]
In [80]: q.size
Out[80]: 2
Then np.unique
returns the desired result:
In [81]: np.unique(q)
Out[81]: array([[1, 2, 3]], dtype=object)
The difference is perhaps clearer if you start with a different q
:
In [20]: q = np.array([[1,2,3],[1,2,4]])
In [21]: q2 = np.empty(2, dtype='object')
In [22]: q2[:] = [[1,2,3],[1,2,4]]
In [23]: q
Out[23]:
array([[1, 2, 3],
[1, 2, 4]])
In [24]: q2
Out[24]: array([[1, 2, 3], [1, 2, 4]], dtype=object)
These two arrays, q
and q2
look similar, but they behave differently.
q
is an array of shape (2,3) with 6 values which are ints.
q2
is an array of shape (2,) with 2 values which are Python lists.
When you apply np.unique
to q
, it is finding the unique values among the 6 ints.
When you apply np.unique
to q2
, it is finding the unique values among the 2 lists.
In [25]: np.unique(q)
Out[25]: array([1, 2, 3, 4])
In [26]: np.unique(q2)
Out[26]: array([[1, 2, 3], [1, 2, 4]], dtype=object)
Your alternative is really just making np.unique(q)
2-dimensional.
In [27]: np.array([np.unique(q).tolist()],dtype='object')
Out[27]: array([[1, 2, 3, 4]], dtype=object)
If that is what you want to do, then you could instead use np.atleast_2d
:
In [28]: np.atleast_2d(np.unique(q))
Out[28]: array([[1, 2, 3, 4]])
If you want the result to be
array([[1, 2, 3], [1, 2, 4]], dtype=object)
then you have to construct q
as a 2-element array of dtype object
. Initializing it with
q = np.empty(2, dtype='object')
is the easiest way I know how to achieve this.
And if you find yourself dealing with NumPy arrays of dtype object
, ask yourself it you might not be better off with plain Python objects:
In [32]: set(map(tuple, ([[1, 2, 3], [1, 2, 4]])))
Out[32]: {(1, 2, 3), (1, 2, 4)}
In [33]: %timeit set(map(tuple, ([[1, 2, 3], [1, 2, 4]])))
1000000 loops, best of 3: 1.07 µs per loop
In [34]: %timeit np.unique(q2)
100000 loops, best of 3: 13.1 µs per loop
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.