[英]Remove duplicates in a list of tuples based on max value
Suppose I have a list of tuples
like this: 假设我有一个像这样的
tuples
列表:
[('Machine1', 88), ('Machine2', 90), ('Machine3', 78), ('Machine1', 90), ('Machine3', 95)]
And I want to filter the list such that I only have the highest values pertaining to each tuple
pairing. 我想过滤列表,这样我就只有与每个
tuple
配对有关的最高值。 So in this example the filtered list would be: 因此,在此示例中,过滤后的列表为:
[('Machine2', 90),('Machine1', 90), ('Machine3', 95)]
I basically want to remove duplicates by the highest value. 我基本上想按最高值删除重复项。 I know
set
only removes exact duplicates so I won't be able to do that here. 我知道
set
只删除确切的重复项,因此我在这里无法做到这一点。 I thought another method I could use would be to use a dictionary
and update it while iterating through the list if a higher value was seen. 我认为我可以使用的另一种方法是使用
dictionary
并在迭代列表时更新它(如果看到更高的值)。 However, what is a more pythonic way to approach this? 但是,有什么更Python的方式来解决这个问题?
One solution with simple dict
一种简单的
dict
解决方案
d = {}
for machine, value in l:
d[machine] = max(d.get(machine, -float('inf')), value)
print(list(d.items()))
Outputs 产出
[('Machine1', 90), ('Machine2', 90), ('Machine3', 95)]
Using pandas
(for fun) 使用
pandas
(好玩)
>>> pd.DataFrame(l).groupby(0).max().to_dict()[1].items()
[('Machine1', 90), ('Machine2', 90), ('Machine3', 95)]
Here's one solution using collections.defaultdict
. 这是一个使用
collections.defaultdict
的解决方案。 The idea is to iterate your list of tuples and append to lists. 这个想法是迭代您的元组列表并追加到列表中。 Then use
zip
with map
+ max
to create the desired result. 然后将
zip
与map
+ max
以创建所需的结果。
from collections import defaultdict
L = [('Machine1', 88), ('Machine2', 90), ('Machine3', 78),
('Machine1', 90), ('Machine3', 95)]
d = defaultdict(list)
for name, num in L:
d[name].append(num)
res = list(zip(d, map(max, d.values())))
Result 结果
[('Machine1', 90), ('Machine2', 90), ('Machine3', 95)]
It may be possible to use the groupby operator in itertools: 在itertools中可能使用groupby运算符:
>>> import itertools as it
>>> [ (k, max( list(zip(*g))[1]) ) for k,g in it.groupby(sorted(data), key=lambda m: m[0])]
Remember that the data is sorted, so you could also do: 请记住,数据已排序,因此您还可以执行以下操作:
>>> [ (k, list(zip(*g))[1][-1] ) for k,g in it.groupby(sorted(data), key=lambda m: m[0])]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.