简体   繁体   中英

Grab unique tuples in python list, irrespective of order

I have a python list:

[ (2,2),(2,3),(1,4),(2,2), etc...]

What I need is some kind of function that reduces it to its unique components... which would be, in the above list:

[ (2,2),(2,3),(1,4) ]

numpy unique does not quite do this. I can think of a way to do it--convert my tuples to numbers, [22,23,14,etc.] , find the uniques, and work back from there...but I don't know if the complexity won't get out of hand. Is there a function that will do what I am trying to do with tuples?


Here is a sample of code that demonstrates the problem:

 import numpy as np

 x = [(2,2),(2,2),(2,3)]
 y = np.unique(x)

returns: y: [2 3]

And here is the implementation of the solution that demonstrates the fix:

 x = [(2,2),(2,2),(2,3)]
 y = list(set(x))

returns y: [(2,2),(2,3)]

If order does not matter

If the order of the result is not critical, you can convert your list to a set (because tuples are hashable) and convert the set back to a list:

>>> l = [(2,2),(2,3),(1,4),(2,2)]
>>> list(set(l))
[(2, 3), (1, 4), (2, 2)]

If order matters

(UPDATE)

As of CPython 3.6 (or any Python 3.7 version) regular dictionaries remember their insertion order, so you can simply issue.

>>> l = [(2,2),(2,3),(1,4),(2,2)]
>>> list(dict.fromkeys(l))
[(2, 2), (2, 3), (1, 4)]

(OLD ANSWER)

If the order is important, the canonical way to filter the duplicates is this:

>>> seen = set()
>>> result = []
>>> for item in l:
...     if item not in seen:
...         seen.add(item)
...         result.append(item)
... 
>>> result
[(2, 2), (2, 3), (1, 4)]

Finally, a little slower and a bit more hackish, you can abuse an OrderedDict as an ordered set:

>>> from collections import OrderedDict
>>> OrderedDict.fromkeys(l).keys() # or list(OrderedDict.fromkeys(l)) if using a version where keys() does not return a list
[(2, 2), (2, 3), (1, 4)]

Using a set will remove duplicates, and you create a list from it afterwards:

>>> list(set([ (2,2),(2,3),(1,4),(2,2) ]))
[(2, 3), (1, 4), (2, 2)]

you could simply do

y = np.unique(x, axis=0)
z = [] 
for i in y:
   z.append(tuple(i))

The reason is that a list of tuples is interpreted by numpy as a 2D array. By setting axis=0, you'd be asking numpy not to flatten the array and return unique rows.

set() will remove all duplicates, and you can then put it back to a list:

unique = list(set(mylist))

Using set() , however, will kill your ordering. If the order matters, you can use a list comprehension that checks if the value already exists earlier in the list:

unique = [v for i,v in enumerate(mylist) if v not in mylist[:i]]

That solution is a little slow, however, so you can do it like this:

unique = []
for tup in mylist:
    if tup not in unique:
        unique.append(tup)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM