I have a nested list that is like this [[county, political party, votes received]] with the datatypes as string, string, and int.
How do I take a nested list and do summations by political party? I would like to have a table that compares all of the different political parties and has their total vote counts.
I know that I can just use a dict or pandas(group_by), but I would like to learn how to do this without them. I cannot find any questions that directly relate to this situation.
You'll need to iterate through all the sub-lists, and store their sum in a map:
sums = {}
for i in big_list:
_, party, votes = i # based on the question
sums[party] = sums.get(party, 0) + votes # if it already has a summation
# just get it, otherwise start
# from a summation of zero
# to get them, just iterate over the map
for party, total_votes in sums.items():
print(party, total_votes)
A dictionary is going to be more efficient but there are other (slower) approaches.
Sorting for example:
totalList = []
for _,party,votes in sorted(voteList,key=lambda v:v[1]):
if not totalList or totalList[-1][0] != party:
totalList.append([party,votes])
else:
totalList[-1][1] += votes
Multiple passes using distinct party names:
parties = {party for _,party,_ in voteList} # set of distinct parties
totalList = [ [party,sum(votes for _,p,votes in voteList if p==party)]
for party in parties ]
There is also the Counter class from collections that is a specialized dictionary for this type of thing:
from collections import Counter
totals = Counter()
for _,party,votes in voteList: totals[party] += votes
i basically tried grouping by header and then apply summation on all columns with a pre-defined valid data type. I've been trying to avoid nested loops. About the input: headers and datatypes are both lists and content is a nested list
def group_by_header(headers, content, datatypes, /):
key_name = "Party"
key_index = headers.index(key_name)
sorted_by_header = sorted(content, key=lambda x: x[key_index])
group_by_header = {}
# create iterator
it = iter(sorted_by_header)
for k, g in itertools.groupby(it, lambda x: x[key_index]):
def sum_of_nums(column, dt, i):
if dt[i] == "float":
tmp = [float(s) for s in column]
yield sum(tmp)
if dt[i] == "int":
tmp = [int(s) for s in column]
yield sum(tmp)
else:
yield "NA"
def generate_sum(group):
index = 0
for c in zip(*list(group)):
yield sum_of_nums(c, datatypes, index)
index += 1
group_by_header.setdefault(k, [])
for j in generate_sum(g):
group_by_header[k].append(next(j))
return group_by_header
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.