简体   繁体   中英

Remove duplicates in a tuple of tuples

I have the following tuple of tuples:

# my Noah's Ark    
myanimals = (('cat', 'dog'), ('callitrix', 'platypus'), ('anaconda', 'python'), ('mouse', 'girafe'),   ... ,('platypus', 'callitrix'))

Since I want a unique list of 2-tuple of animals, the pair ('platypus', 'callitrix') is considered to be a duplicate of ('callitrix', 'platypus').

How can I elegantly remove from myanimals (with the minimum of code) all the kind of pairs (b,a) duplicates of (a, b)?

I'm gonna answer in two parts:

  1. Not strictly an answer to your question, but a suggestion that would make working with this much easier: if your code allows for using set s instead of tuple s, the you can use the keyword in to check what you need:
myanimals = ({'cat', 'dog'}, {'callitrix', 'platypus'}, {'anaconda', 'python'}, {'mouse', 'girafe'},   ... {('platypus', 'callitrix')})
{'platypus', 'callitrix'} in myanimals # returns True, since {'a', 'b'}=={'b', 'a'}

So, making a set of sets will make it so that duplicates are automatically removed:

myanimals = {{'cat', 'dog'}, {'callitrix', 'platypus'}, {'anaconda', 'python'}, {'mouse', 'girafe'},   ..., {'platypus', 'callitrix'} }

Will automatically remove the duplicate {'platypus', 'callitrix'} .

However doing this means that you cannot have pairs of animals be the same two animal, since {'a', 'a'} is simply {'a'} .

  1. Actually using the tuples is bit more cumbersome. Since tuples are immutable, you are gonna need to create a new one from scratch, and in doing so, filter out duplicates:
myanimals = (('cat', 'dog'), ('callitrix', 'platypus'), ('anaconda', 'python'), ('mouse', 'girafe'),   ... ,('platypus', 'callitrix'))
myanimals_clean = []
for pair in myanimals:
   if pair not in myanimals_clean and (pair[1], pair[0]) not in myanimal_clean:
       myanimals_clean.append(pair)

You can clean this up a bit using itertools.permutations() , but I don't think it's worth the hassle of the extra import.

Finally, you can do a hybrid of both answers and turn your tuple of tuples into a tuple of sets to make the check and then back into tuples:

myanimals = tuple( (set(pair) for pair in myanimals) )
myanimals = tuple( (tuple(pair) for pair in myanimals if pair not in myanimals) )

You could use a set on sorted tuple values or convert the list to a dictionary where the key is the tuple in sorted order. This will leave only one value per combination:

list({*map(tuple,map(sorted,myanimals))})

or

list(dict(zip(map(tuple,map(sorted,myanimals)),myanimals)).values())

Broken down

[*map(sorted,myanimals)] # sorted tuples

# [['cat', 'dog'], ['callitrix', 'platypus'], ['anaconda', 'python'], ['girafe', 'mouse'], ['callitrix', 'platypus']]

# notice that both ('callitrix', 'platypus') and ('platypus', 'callitrix')
# are converted to ('callitrix', 'platypus')

Since this gives a list of lists and dictionary keys need to be hashable, we convert the items to tuples:

[*map(tuple,map(sorted,myanimals))]

# [('cat', 'dog'), ('callitrix', 'platypus'), ('anaconda', 'python'), ('girafe', 'mouse'), ('callitrix', 'platypus')]

Those can already be converted to a list of unique pairs by placing it in a set and converting the set back to a list:

list({*map(tuple,map(sorted,myanimals))})

# [('girafe', 'mouse'), ('callitrix', 'platypus'), ('anaconda', 'python'), ('cat', 'dog')]

If you don't care about the original order of values in each tuple, you can stop there. But, if you need ('mouse','girafe') to remain in that order, then we need an extra step to separate uniqueness filtering from tuple contents. This is where the dictionary comes in. We will want to use these sorted tuples as keys but retain the original order as values. The zip function allows this by combining the key part with the original tuples:

[*zip(map(tuple,map(sorted,myanimals)),myanimals)]

# [(('cat', 'dog'), ('cat', 'dog')), (('callitrix', 'platypus'), ('callitrix', 'platypus')), (('anaconda', 'python'), ('anaconda', 'python')), (('girafe', 'mouse'), ('mouse', 'girafe')), (('callitrix', 'platypus'), ('platypus', 'callitrix'))]

Feeding that into a dictionary will only keep the last value for each distinct key and we can simply pick up the values to form the resulting list of tuples:

list(dict(zip(map(tuple,map(sorted,myanimals)),myanimals)).values())
  
[('cat', 'dog'), ('platypus', 'callitrix'), ('anaconda', 'python'), ('mouse', 'girafe')]

Alternatively

Note that the above selected ('platypus', 'callitrix') over ('platypus', 'callitrix') because it retains the last occurrence of duplicate entries.

If you need the first occurrence to be kept, you can use a different approach that progressively fills a set of both tuple orders and filters based on first addition of each tuple to the set.

[t for s in [{myanimals}] for t in myanimals 
   if t not in s and not s.update((t,t[::-1]))]
  
# [('cat', 'dog'), ('callitrix', 'platypus'), ('anaconda', 'python'), ('mouse', 'girafe')]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM