简体   繁体   中英

Sort & count list of 3-element tuples by second and third value of a tuple

I have such list of tuples

l =[(1, 'project1', 'errorMessage1'), 
    (2, 'project1', 'errorMessage1'), 
    (3, 'project2', 'errorMessage1'),
    (1, 'project3', 'errorMessage2')]

I would like the first column to hold the sum of all values with distinct project and errorMessage, like this:

[(3, 'project1', 'errorMessage1'),
 (3, 'project2', 'errorMessage1'),
 (1, 'project3', 'errorMessage3')]

I tried Counter, and some other stuff but don't see how should I approach this.

You could solve this using a dictionary to hold the sum of the counts:

l =[(1, 'project1', 'errorMessage1'), 
    (2, 'project1', 'errorMessage1'), 
    (3, 'project2', 'errorMessage1'),
    (1, 'project3', 'errorMessage2')]

d = {}

for t in l:
    if t[1:] in d:
        d[t[1:]] += t[0]
    else:
        d[t[1:]] = t[0]

Output:

>>> d
{('project1', 'errorMessage1'): 3,
 ('project2', 'errorMessage1'): 3,
 ('project3', 'errorMessage2'): 1}

Add a list comprehension to reformat the result:

>>> [(v, *k) for k, v in d.items()]
[(3, 'project1', 'errorMessage1'),
 (3, 'project2', 'errorMessage1'),
 (1, 'project3', 'errorMessage2')]

Assuming you want to sum the 0th elements of the tuples, if you don't want to use a dict, you can also achieve this using itertools.groupby and sum like so:

from itertools import groupby
from operator import itemgetter

input = [
    (1, 'project1', 'errorMessage1'),
    (2, 'project1', 'errorMessage1'),
    (3, 'project2', 'errorMessage1'),
    (1, 'project3', 'errorMessage2'),
]

def sum_by_project_and_error(input):
    # groupby needs the iterable to be sorted by the elements we want to group by.
    # We sort by project and error message (the 1st and 2nd element of the tuples) using itemgetter. 
    key_function = itemgetter(1, 2)
    sorted_input = sorted(input, key=key_function)
    grouped_input = groupby(sorted_input, key=key_function)

    for (project, error), group in grouped_input:
        yield sum(count for count, _, _ in group), project, error


output = list(sum_by_project_and_error(input))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM