简体   繁体   English

根据最大值删除元组列表中的重复项

[英]Remove duplicates in a list of tuples based on max value

Suppose I have a list of tuples like this: 假设我有一个像这样的tuples列表:

[('Machine1', 88), ('Machine2', 90), ('Machine3', 78), ('Machine1', 90), ('Machine3', 95)]

And I want to filter the list such that I only have the highest values pertaining to each tuple pairing. 我想过滤列表,这样我就只有与每个tuple配对有关的最高值。 So in this example the filtered list would be: 因此,在此示例中,过滤后的列表为:

[('Machine2', 90),('Machine1', 90), ('Machine3', 95)]

I basically want to remove duplicates by the highest value. 我基本上想按最高值删除重复项。 I know set only removes exact duplicates so I won't be able to do that here. 我知道set只删除确切的重复项,因此我在这里无法做到这一点。 I thought another method I could use would be to use a dictionary and update it while iterating through the list if a higher value was seen. 我认为我可以使用的另一种方法是使用dictionary并在迭代列表时更新它(如果看到更高的值)。 However, what is a more pythonic way to approach this? 但是,有什么更Python的方式来解决这个问题?

One solution with simple dict 一种简单的dict解决方案

d = {}
for machine, value in l:
    d[machine] = max(d.get(machine, -float('inf')), value)
print(list(d.items()))

Outputs 产出

[('Machine1', 90), ('Machine2', 90), ('Machine3', 95)]

Using pandas (for fun) 使用pandas (好玩)

>>> pd.DataFrame(l).groupby(0).max().to_dict()[1].items()
[('Machine1', 90), ('Machine2', 90), ('Machine3', 95)]

Here's one solution using collections.defaultdict . 这是一个使用collections.defaultdict的解决方案。 The idea is to iterate your list of tuples and append to lists. 这个想法是迭代您的元组列表并追加到列表中。 Then use zip with map + max to create the desired result. 然后将zipmap + max以创建所需的结果。

from collections import defaultdict

L = [('Machine1', 88), ('Machine2', 90), ('Machine3', 78),
     ('Machine1', 90), ('Machine3', 95)]

d = defaultdict(list)

for name, num in L:
    d[name].append(num)

res =  list(zip(d, map(max, d.values())))

Result 结果

[('Machine1', 90), ('Machine2', 90), ('Machine3', 95)]

It may be possible to use the groupby operator in itertools: 在itertools中可能使用groupby运算符:

>>> import itertools as it
>>> [ (k, max( list(zip(*g))[1])   ) for k,g in it.groupby(sorted(data), key=lambda m: m[0])]

Remember that the data is sorted, so you could also do: 请记住,数据已排序,因此您还可以执行以下操作:

>>> [ (k, list(zip(*g))[1][-1]   ) for k,g in it.groupby(sorted(data), key=lambda m: m[0])]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM