简体   繁体   English

删除元组元组中的重复项

[英]Remove duplicates in a tuple of tuples

I have the following tuple of tuples:我有以下元组元组:

# my Noah's Ark    
myanimals = (('cat', 'dog'), ('callitrix', 'platypus'), ('anaconda', 'python'), ('mouse', 'girafe'),   ... ,('platypus', 'callitrix'))

Since I want a unique list of 2-tuple of animals, the pair ('platypus', 'callitrix') is considered to be a duplicate of ('callitrix', 'platypus').由于我想要一个唯一的 2 元组动物列表,因此这对 ('platypus', 'callitrix') 被认为是 ('callitrix', 'platypus') 的副本。

How can I elegantly remove from myanimals (with the minimum of code) all the kind of pairs (b,a) duplicates of (a, b)?我怎样才能优雅地从我的动物(用最少的代码)中删除(a,b)的所有类型的对(b,a)重复?

I'm gonna answer in two parts:我分两部分回答:

  1. Not strictly an answer to your question, but a suggestion that would make working with this much easier: if your code allows for using set s instead of tuple s, the you can use the keyword in to check what you need:严格来说不是对您的问题的回答,而是一个可以让您更轻松地使用它的建议:如果您的代码允许使用set而不是tuple ,您可以使用关键字in来检查您需要的内容:
myanimals = ({'cat', 'dog'}, {'callitrix', 'platypus'}, {'anaconda', 'python'}, {'mouse', 'girafe'},   ... {('platypus', 'callitrix')})
{'platypus', 'callitrix'} in myanimals # returns True, since {'a', 'b'}=={'b', 'a'}

So, making a set of sets will make it so that duplicates are automatically removed:因此,制作一组集合将使重复项被自动删除:

myanimals = {{'cat', 'dog'}, {'callitrix', 'platypus'}, {'anaconda', 'python'}, {'mouse', 'girafe'},   ..., {'platypus', 'callitrix'} }

Will automatically remove the duplicate {'platypus', 'callitrix'} .将自动删除重复的{'platypus', 'callitrix'}

However doing this means that you cannot have pairs of animals be the same two animal, since {'a', 'a'} is simply {'a'} .然而,这样做意味着你不能让成对的动物成为相同的两个动物,因为{'a', 'a'}只是{'a'}

  1. Actually using the tuples is bit more cumbersome.实际上使用元组有点麻烦。 Since tuples are immutable, you are gonna need to create a new one from scratch, and in doing so, filter out duplicates:由于元组是不可变的,因此您需要从头开始创建一个新元组,并在此过程中过滤掉重复项:
myanimals = (('cat', 'dog'), ('callitrix', 'platypus'), ('anaconda', 'python'), ('mouse', 'girafe'),   ... ,('platypus', 'callitrix'))
myanimals_clean = []
for pair in myanimals:
   if pair not in myanimals_clean and (pair[1], pair[0]) not in myanimal_clean:
       myanimals_clean.append(pair)

You can clean this up a bit using itertools.permutations() , but I don't think it's worth the hassle of the extra import.您可以使用itertools.permutations()稍微清理一下,但我认为额外导入的麻烦不值得。

Finally, you can do a hybrid of both answers and turn your tuple of tuples into a tuple of sets to make the check and then back into tuples:最后,您可以混合使用这两个答案,并将您的元组元组转换为集合元组以进行检查,然后再返回元组:

myanimals = tuple( (set(pair) for pair in myanimals) )
myanimals = tuple( (tuple(pair) for pair in myanimals if pair not in myanimals) )

You could use a set on sorted tuple values or convert the list to a dictionary where the key is the tuple in sorted order.您可以在已排序的元组值上使用集合,或将列表转换为字典,其中键是按排序顺序的元组。 This will leave only one value per combination:这将只留下每个组合一个值:

list({*map(tuple,map(sorted,myanimals))})

or或者

list(dict(zip(map(tuple,map(sorted,myanimals)),myanimals)).values())

Broken down坏掉了

[*map(sorted,myanimals)] # sorted tuples

# [['cat', 'dog'], ['callitrix', 'platypus'], ['anaconda', 'python'], ['girafe', 'mouse'], ['callitrix', 'platypus']]

# notice that both ('callitrix', 'platypus') and ('platypus', 'callitrix')
# are converted to ('callitrix', 'platypus')

Since this gives a list of lists and dictionary keys need to be hashable, we convert the items to tuples:由于这给出了一个列表列表,并且字典键需要是可散列的,我们将项目转换为元组:

[*map(tuple,map(sorted,myanimals))]

# [('cat', 'dog'), ('callitrix', 'platypus'), ('anaconda', 'python'), ('girafe', 'mouse'), ('callitrix', 'platypus')]

Those can already be converted to a list of unique pairs by placing it in a set and converting the set back to a list:通过将它们放在一个集合中并将集合转换回列表,这些已经可以转换为唯一对的列表:

list({*map(tuple,map(sorted,myanimals))})

# [('girafe', 'mouse'), ('callitrix', 'platypus'), ('anaconda', 'python'), ('cat', 'dog')]

If you don't care about the original order of values in each tuple, you can stop there.如果您不关心每个元组中值的原始顺序,则可以停止。 But, if you need ('mouse','girafe') to remain in that order, then we need an extra step to separate uniqueness filtering from tuple contents.但是,如果您需要 ('mouse','girafe') 保持该顺序,那么我们需要一个额外的步骤来将唯一性过滤与元组内容分开。 This is where the dictionary comes in. We will want to use these sorted tuples as keys but retain the original order as values.这就是字典的用武之地。我们希望将这些排序的元组用作键,但保留原始顺序作为值。 The zip function allows this by combining the key part with the original tuples: zip function 通过将关键部分与原始元组组合来实现这一点:

[*zip(map(tuple,map(sorted,myanimals)),myanimals)]

# [(('cat', 'dog'), ('cat', 'dog')), (('callitrix', 'platypus'), ('callitrix', 'platypus')), (('anaconda', 'python'), ('anaconda', 'python')), (('girafe', 'mouse'), ('mouse', 'girafe')), (('callitrix', 'platypus'), ('platypus', 'callitrix'))]

Feeding that into a dictionary will only keep the last value for each distinct key and we can simply pick up the values to form the resulting list of tuples:将其输入字典只会保留每个不同键的最后一个值,我们可以简单地选取这些值来形成元组的结果列表:

list(dict(zip(map(tuple,map(sorted,myanimals)),myanimals)).values())
  
[('cat', 'dog'), ('platypus', 'callitrix'), ('anaconda', 'python'), ('mouse', 'girafe')]

Alternatively或者

Note that the above selected ('platypus', 'callitrix') over ('platypus', 'callitrix') because it retains the last occurrence of duplicate entries.请注意,上面选择的 ('platypus', 'callitrix') 优于 ('platypus', 'callitrix') 因为它保留了最后一次出现的重复条目。

If you need the first occurrence to be kept, you can use a different approach that progressively fills a set of both tuple orders and filters based on first addition of each tuple to the set.如果您需要保留第一次出现,您可以使用不同的方法,根据每个元组第一次添加到集合中逐步填充一组元组顺序和过滤器。

[t for s in [{myanimals}] for t in myanimals 
   if t not in s and not s.update((t,t[::-1]))]
  
# [('cat', 'dog'), ('callitrix', 'platypus'), ('anaconda', 'python'), ('mouse', 'girafe')]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM