简体   繁体   中英

Numpy.unique - getting output with consistent depth

Say I have two numpy arrays of lists:

r = np.array([[1,2,3],[1,2,3],[4,5]])
q= np.array([[1,2,3],[1,2,3]])

and I use numpy.unique to trim them down to just the unique lists

np.unique(r)
array([[1, 2, 3], [4, 5]], dtype=object)
np.unique(q)
array([1, 2, 3])

You can see that the output from np.unique(q) has a different depth.

My question is; how can I keep a consistent "depth" for both of the above examples?

ie so that np.unique(q) gives:

array([[1, 2, 3]], dtype=object)

This has to do with the way q is constructed. When you say

q = np.array([[1,2,3],[1,2,3]])

You are creating an array of 6 items:

In [77]: q.size
Out[77]: 6

When you apply np.unique to this array, each of the 6 items is considered a separate value, and the unique ones are returned.

If instead you create a q with just 2 items which are Python lists:

In [78]: q = np.empty(2, dtype='object')

In [79]: q[:] = [[1,2,3],[1,2,3]]

In [80]: q.size
Out[80]: 2

Then np.unique returns the desired result:

In [81]: np.unique(q)
Out[81]: array([[1, 2, 3]], dtype=object)

The difference is perhaps clearer if you start with a different q :

In [20]: q = np.array([[1,2,3],[1,2,4]])

In [21]: q2 = np.empty(2, dtype='object')

In [22]: q2[:] = [[1,2,3],[1,2,4]]

In [23]: q
Out[23]: 
array([[1, 2, 3],
       [1, 2, 4]])

In [24]: q2
Out[24]: array([[1, 2, 3], [1, 2, 4]], dtype=object)

These two arrays, q and q2 look similar, but they behave differently.

q is an array of shape (2,3) with 6 values which are ints.

q2 is an array of shape (2,) with 2 values which are Python lists.

When you apply np.unique to q , it is finding the unique values among the 6 ints.

When you apply np.unique to q2 , it is finding the unique values among the 2 lists.

In [25]: np.unique(q)
Out[25]: array([1, 2, 3, 4])

In [26]: np.unique(q2)
Out[26]: array([[1, 2, 3], [1, 2, 4]], dtype=object)

Your alternative is really just making np.unique(q) 2-dimensional.

In [27]: np.array([np.unique(q).tolist()],dtype='object')
Out[27]: array([[1, 2, 3, 4]], dtype=object)

If that is what you want to do, then you could instead use np.atleast_2d :

In [28]: np.atleast_2d(np.unique(q))
Out[28]: array([[1, 2, 3, 4]])

If you want the result to be

array([[1, 2, 3], [1, 2, 4]], dtype=object)

then you have to construct q as a 2-element array of dtype object . Initializing it with

q = np.empty(2, dtype='object')

is the easiest way I know how to achieve this.


And if you find yourself dealing with NumPy arrays of dtype object , ask yourself it you might not be better off with plain Python objects:

In [32]: set(map(tuple, ([[1, 2, 3], [1, 2, 4]])))
Out[32]: {(1, 2, 3), (1, 2, 4)}

In [33]: %timeit set(map(tuple, ([[1, 2, 3], [1, 2, 4]])))
1000000 loops, best of 3: 1.07 µs per loop

In [34]: %timeit np.unique(q2)
100000 loops, best of 3: 13.1 µs per loop

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM