简体   繁体   中英

Nested list summation by group in python

I have a nested list that is like this [[county, political party, votes received]] with the datatypes as string, string, and int.

How do I take a nested list and do summations by political party? I would like to have a table that compares all of the different political parties and has their total vote counts.

I know that I can just use a dict or pandas(group_by), but I would like to learn how to do this without them. I cannot find any questions that directly relate to this situation.

You'll need to iterate through all the sub-lists, and store their sum in a map:

sums = {}
for i in big_list:
    _, party, votes = i # based on the question
    sums[party] = sums.get(party, 0) + votes # if it already has a summation
                                             # just get it, otherwise start 
                                             # from a summation of zero

# to get them, just iterate over the map
for party, total_votes in sums.items():
    print(party, total_votes)

A dictionary is going to be more efficient but there are other (slower) approaches.

Sorting for example:

totalList = []
for _,party,votes in sorted(voteList,key=lambda v:v[1]):
    if not totalList or totalList[-1][0] != party:
        totalList.append([party,votes])
    else:
        totalList[-1][1] += votes

Multiple passes using distinct party names:

parties   = {party for _,party,_ in voteList}  # set of distinct parties
totalList = [ [party,sum(votes for _,p,votes in voteList if p==party)]
              for party in parties ]

There is also the Counter class from collections that is a specialized dictionary for this type of thing:

from collections import Counter
totals = Counter()
for _,party,votes in voteList: totals[party] += votes

i basically tried grouping by header and then apply summation on all columns with a pre-defined valid data type. I've been trying to avoid nested loops. About the input: headers and datatypes are both lists and content is a nested list

def group_by_header(headers, content, datatypes, /):
    key_name = "Party"
    key_index = headers.index(key_name)
    sorted_by_header = sorted(content, key=lambda x: x[key_index])
    group_by_header = {}

    # create iterator
    it = iter(sorted_by_header)
    for k, g in itertools.groupby(it, lambda x: x[key_index]):
        def sum_of_nums(column, dt, i):
            if dt[i] == "float":
                tmp = [float(s) for s in column]
                yield sum(tmp)
            if dt[i] == "int":
                tmp = [int(s) for s in column]
                yield sum(tmp)
            else:
                yield "NA"

        def generate_sum(group):
            index = 0
            for c in zip(*list(group)):
                yield sum_of_nums(c, datatypes, index)
                index += 1

        group_by_header.setdefault(k, [])
        for j in generate_sum(g):
            group_by_header[k].append(next(j))

    return group_by_header

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM