Return lists that have the highest value per group

Question

I currently have a list of locations that I would like to sort out.

The list looks like the following:

list = [['Location 1', 5],['Location 2', 5],['Location 3', 5],['Location 1', 4],['Location 2', 6],['Location 3', 5],['Location 1', 5],['Location 2', 5]]

The goal is to select the highest value of each list in index 1 for every location. The final results should look like the following:

correctList = [['Location 1', 5],['Location 2', 6],['Location 3', 5]]

Locations with the same integer value has no preference.

The solution that I have now is appending each location to there own list based on name. Then from each list using a max() operation on each location list.

Answer 1

You can use itertools.groupby to select the list with the max second element, once the lists have been sorted using the first element:

s = sorted(l, key=lambda x: x[0])
[max(k) for i,k in groupby(s, key=lambda x: x[0])]
[['Location 1', 5], ['Location 2', 6], ['Location 3', 5]]

Where:

sorted(l, key=lambda x: x[0])

[['Location 1', 5],
 ['Location 1', 4],
 ['Location 1', 5],
 ['Location 2', 5],
 ['Location 2', 6],
 ['Location 2', 5],
 ['Location 3', 5],
 ['Location 3', 5]]

Note that max will give the desired output when fed a set of lists as:

max(['Location 1', 5], ['Location 1', 4], ['Location 1', 5])
#['Location 1', 5]

Answer 2

You can use collections.defaultdict for an O( n ) solution:

from collections import defaultdict

L = [['Location 1', 5],['Location 2', 5],['Location 3', 5],['Location 1', 4],
     ['Location 2', 6],['Location 3', 5],['Location 1', 5],['Location 2', 5]]

dd = defaultdict(int)

for location, value in L:
    dd[location] = max(dd[location], value)

print(dd)
# defaultdict(int, {'Location 1': 5, 'Location 2': 6, 'Location 3': 5})

This gives a dictionary mapping. If you are keen on a list of lists:

res = list(map(list, dd.items()))

print(res)
# [['Location 1', 5], ['Location 2', 6], ['Location 3', 5]]

Answer 3

You could use a dictionary to compute the maximum value per location in O(n) :

data = [['Location 1', 5], ['Location 2', 5], ['Location 3', 5], ['Location 1', 4], ['Location 2', 6],
        ['Location 3', 5], ['Location 1', 5], ['Location 2', 5]]

groups = {}
for location, value in data:
    if location not in groups:
        groups[location] = value
    else:
        groups[location] = max(groups[location], value)

result = [[location, value] for location, value in groups.items()]

print(result)

Output

[['Location 1', 5], ['Location 2', 6], ['Location 3', 5]]

Answer 4

You can use pandas for this, it is very easy to group by one key and calculate something for each group:

import pandas as pd

df = pd.DataFrame([['Location 1', 5],['Location 2', 5],['Location 3', 5],['Location 1', 4],['Location 2', 6],['Location 3', 5],['Location 1', 5],['Location 2', 5]],
                  columns=["location", "value"])
df.groupby("location").max()
#             value
# location         
# Location 1      5
# Location 2      6
# Location 3      5

If you absolutely need a list of lists afterwards, that is also possible:

df.groupby("location").max().reset_index().values.tolist()
# [['Location 1', 5], ['Location 2', 6], ['Location 3', 5]]

Note that if this is the only thing you want to do with this data, this is probably overkill. But if you need to do some more analysis with it, getting used to pandas can speed up a lot of things, since most of its methods are vectorized and written in C.

Return lists that have the highest value per group

Question

4 answers

solution1
1 2019-01-14 17:19:49

solution2
1 2019-01-14 17:19:54

solution3
0 2019-01-14 17:19:29

solution4
0 2019-01-14 17:20:12

Return lists that have the highest value per group

Question

4 answers

solution1 1 2019-01-14 17:19:49

solution2 1 2019-01-14 17:19:54

solution3 0 2019-01-14 17:19:29

solution4 0 2019-01-14 17:20:12

solution1
1 2019-01-14 17:19:49

solution2
1 2019-01-14 17:19:54

solution3
0 2019-01-14 17:19:29

solution4
0 2019-01-14 17:20:12