简体   繁体   English

返回每组价值最高的列表

[英]Return lists that have the highest value per group

I currently have a list of locations that I would like to sort out. 我目前有一个要整理的位置列表。

The list looks like the following: 该列表如下所示:

list = [['Location 1', 5],['Location 2', 5],['Location 3', 5],['Location 1', 4],['Location 2', 6],['Location 3', 5],['Location 1', 5],['Location 2', 5]]

The goal is to select the highest value of each list in index 1 for every location. 目标是为每个位置选择索引1中每个列表的最大值。 The final results should look like the following: 最终结果应如下所示:

correctList = [['Location 1', 5],['Location 2', 6],['Location 3', 5]]

Locations with the same integer value has no preference. 具有相同整数值的位置没有首选项。

The solution that I have now is appending each location to there own list based on name. 我现在拥有的解决方案是根据名称将每个位置附加到自己的列表中。 Then from each list using a max() operation on each location list. 然后从每个列表中使用每个位置列表上的max()操作。

You can use itertools.groupby to select the list with the max second element, once the lists have been sorted using the first element: 一旦使用第一个元素对列表进行了排序,就可以使用itertools.groupby选择第二个元素max的列表:

s = sorted(l, key=lambda x: x[0])
[max(k) for i,k in groupby(s, key=lambda x: x[0])]
[['Location 1', 5], ['Location 2', 6], ['Location 3', 5]]

Where: 哪里:

sorted(l, key=lambda x: x[0])

[['Location 1', 5],
 ['Location 1', 4],
 ['Location 1', 5],
 ['Location 2', 5],
 ['Location 2', 6],
 ['Location 2', 5],
 ['Location 3', 5],
 ['Location 3', 5]]

Note that max will give the desired output when fed a set of lists as: 请注意,当输入一组列表时, max将提供所需的输出:

max(['Location 1', 5], ['Location 1', 4], ['Location 1', 5])
#['Location 1', 5]

You can use collections.defaultdict for an O( n ) solution: 您可以将collections.defaultdict用于O( n )解决方案:

from collections import defaultdict

L = [['Location 1', 5],['Location 2', 5],['Location 3', 5],['Location 1', 4],
     ['Location 2', 6],['Location 3', 5],['Location 1', 5],['Location 2', 5]]

dd = defaultdict(int)

for location, value in L:
    dd[location] = max(dd[location], value)

print(dd)
# defaultdict(int, {'Location 1': 5, 'Location 2': 6, 'Location 3': 5})

This gives a dictionary mapping. 这给出了字典映射。 If you are keen on a list of lists: 如果您热衷于列表列表:

res = list(map(list, dd.items()))

print(res)
# [['Location 1', 5], ['Location 2', 6], ['Location 3', 5]]

You could use a dictionary to compute the maximum value per location in O(n) : 您可以使用字典来计算O(n)中每个位置的最大值:

data = [['Location 1', 5], ['Location 2', 5], ['Location 3', 5], ['Location 1', 4], ['Location 2', 6],
        ['Location 3', 5], ['Location 1', 5], ['Location 2', 5]]

groups = {}
for location, value in data:
    if location not in groups:
        groups[location] = value
    else:
        groups[location] = max(groups[location], value)

result = [[location, value] for location, value in groups.items()]

print(result)

Output 产量

[['Location 1', 5], ['Location 2', 6], ['Location 3', 5]]

You can use pandas for this, it is very easy to group by one key and calculate something for each group: 您可以使用pandas ,这很容易按一个键分组并为每个分组计算一些内容:

import pandas as pd

df = pd.DataFrame([['Location 1', 5],['Location 2', 5],['Location 3', 5],['Location 1', 4],['Location 2', 6],['Location 3', 5],['Location 1', 5],['Location 2', 5]],
                  columns=["location", "value"])
df.groupby("location").max()
#             value
# location         
# Location 1      5
# Location 2      6
# Location 3      5

If you absolutely need a list of lists afterwards, that is also possible: 如果之后绝对需要列表列表,则也可以:

df.groupby("location").max().reset_index().values.tolist()
# [['Location 1', 5], ['Location 2', 6], ['Location 3', 5]]

Note that if this is the only thing you want to do with this data, this is probably overkill. 请注意,如果这是您唯一要处理的数据,则可能是过大了。 But if you need to do some more analysis with it, getting used to pandas can speed up a lot of things, since most of its methods are vectorized and written in C. 但是,如果您需要对其进行更多分析,那么习惯pandas可以加快很多工作,因为它的大多数方法都是矢量化的,并且是用C语言编写的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM