I have a list of lists of the form:
[['about70-130 characters long string', '332'], ['someotherrandomstring','2'], ['about70-130 characters long string', 32], ['someotherrandomstring', '3333']]
TO DO: I eventually want to sum the sizes of all the repeated strings like so:
[['about70-130 characters long string',364], ['someotherrandomstring',3335]]
I wrote a brute-force code to solve this but it's taking me a lot of time because the list has about 2 million lists. The very non-efficient code I wrote is:
final = {}
for element in both_list:
size = int(element[1])
if element[0] not in final.keys():
final[element[0]] = size
else:
final[element[0]] += size
I'm pretty sure there's a more time-efficient code but I can't seem to come up with any ideas. Any help and pointers in the right direction would be much appreciated. Thank you.
If you are okay to use third party library pandas
import pandas as pd
a=[['about70-130 characters long string', '332'],
['someotherrandomstring','2'],['about70-130 characters long string', 32],['someotherrandomstring', '3333']]
df=pd.DataFrame(a,columns=['label','counts'])
df.counts=df.counts.astype('int')
df.groupby('label')['counts'].sum().to_dict()
It might be little faster than your solution
a=[['about70-130 characters long string', '332'],
['someotherrandomstring','2'],['about70-130 characters long string', 32],['someotherrandomstring', '3333']]
d={}
for i in a:
if i[0] not in d:
d[i[0]]=d.get(i[0],int(i[1]))
else:
d[i[0]]=d.get(i[0])+int(i[1])
Using itertools.groupby
with operator.itemgetter
, or lambda
from itertools import groupby
from operator import itemgetter
lst = sorted(lst, key=itemgetter(0))
res = []
for k, g in groupby(lst, key=itemgetter(0)):
res.append([k, sum([int(i[1]) for i in list(g)])])
print(res)
# [['about70-130 characters long string', 364], ['someotherrandomstring', 3335]]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.