简体   繁体   中英

Get maximum tuples in list

I have a list of tuples that can be understood as key-value pairs, where a key can appear several times, possibly with different values, for example

[(2,8),(5,10),(2,5),(3,4),(5,50)]

I now want to get a list of tuples with the highest value for each key, ie

[(2,8),(3,4),(5,50)]

The order of the keys is irrelevant.

How do I do that in an efficient way?

Sort them and then cast to a dictionary and take the items again from it:

l = [(2,8),(5,10),(2,5),(3,4),(5,50)]
list(dict(sorted(l)).items()) #python3, if python2 list cast is not needed
[(2, 8), (3, 4), (5, 50)]

The idea is that the key-value pairs will get updated in ascending order when transforming to a dictionary filtering the lowest values for each key, then you just have to take it as tuples.

At its core, this problem is essentially about grouping the tuples based on their first element and then keeping only the maximum of each group.

Grouping can be done easily with a defaultdict . A detailed explanation of grouping with defaultdicts can be found in my answer here . In your case, we group the tuples by their first element and then use the max function to find the tuple with the largest number.

import collections

tuples = [(2,8),(5,10),(2,5),(3,4),(5,50)]

groupdict = collections.defaultdict(list)
for tup in tuples:
    group = tup[0]
    groupdict[group].append(tup)

result = [max(group) for group in groupdict.values()]
# result: [(2, 8), (5, 50), (3, 4)]

In your particular case, we can optimize the code a little bit by storing only the maximum 2nd element in the dict, rather than storing a list of all tuples and finding the maximum at the end:

tuples = [(2,8),(5,10),(2,5),(3,4),(5,50)]

groupdict = {}
for tup in tuples:
    group, value = tup

    if group in groupdict:
        groupdict[group] = max(groupdict[group], value)
    else:
        groupdict[group] = value

result = [(group, value) for group, value in groupdict.items()]

This keeps the memory footprint to a minimum, but only works for tuples with exactly 2 elements.


This has a number of advantages over Netwave's solution :

  • It's more readable. Anyone who sees a defaultdict being instantiated knows that it'll be used to group data, and the use of the max function makes it easy to understand which tuples are kept. Netwave's one-liner is clever, but clever solutions are rarely easy to read.
  • Since the data doesn't have to be sorted, this runs in linear O(n) time instead of O(n log n).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM