Counting frequencies(efficiently) of strings in a large text file when their pre-counts are given

Question

I have a list of lists of the form:

[['about70-130 characters long string', '332'], ['someotherrandomstring','2'], ['about70-130 characters long string', 32], ['someotherrandomstring', '3333']]

TO DO: I eventually want to sum the sizes of all the repeated strings like so:

[['about70-130 characters long string',364], ['someotherrandomstring',3335]]

I wrote a brute-force code to solve this but it's taking me a lot of time because the list has about 2 million lists. The very non-efficient code I wrote is:

final = {} 
for element in both_list:
    size = int(element[1])
    if element[0] not in final.keys():
       final[element[0]] = size
    else:
       final[element[0]] += size

I'm pretty sure there's a more time-efficient code but I can't seem to come up with any ideas. Any help and pointers in the right direction would be much appreciated. Thank you.

Answer 1

If you are okay to use third party library pandas

import pandas as pd
a=[['about70-130 characters long string', '332'], 
    ['someotherrandomstring','2'],['about70-130 characters long string', 32],['someotherrandomstring', '3333']]
df=pd.DataFrame(a,columns=['label','counts'])
df.counts=df.counts.astype('int')
df.groupby('label')['counts'].sum().to_dict()

It might be little faster than your solution

a=[['about70-130 characters long string', '332'], 
    ['someotherrandomstring','2'],['about70-130 characters long string', 32],['someotherrandomstring', '3333']]
d={}
for i in a:
    if i[0] not in d:
        d[i[0]]=d.get(i[0],int(i[1]))
    else:
        d[i[0]]=d.get(i[0])+int(i[1])

Answer 2

Using itertools.groupby with operator.itemgetter , or lambda

from itertools import groupby
from operator import itemgetter

lst = sorted(lst, key=itemgetter(0))
res = []

for k, g in groupby(lst, key=itemgetter(0)):
    res.append([k, sum([int(i[1]) for i in list(g)])])
print(res)
# [['about70-130 characters long string', 364], ['someotherrandomstring', 3335]]

Counting frequencies(efficiently) of strings in a large text file when their pre-counts are given

Question

2 answers

solution1
1 2018-10-09 14:22:36

solution2
-1 2018-10-09 16:01:52

Counting frequencies(efficiently) of strings in a large text file when their pre-counts are given

Question

2 answers

solution1 1 2018-10-09 14:22:36

solution2 -1 2018-10-09 16:01:52

solution1
1 2018-10-09 14:22:36

solution2
-1 2018-10-09 16:01:52