Python dictionary: Sum of all the values for all the “distinct first word” in the “key.”

Question

I have a dictionary that has a combination of two words as "keys", and certain number as its "values." Example:

bigram_counts = {(u',', u'which'): 1, (u'of', u'the'): 2, ('<UNK>', u'by'): 2, (u'in', '<UNK>'): 1, ('<UNK>', u'charge'): 1, (u'``', '<UNK>'): 2, (u'The', u'and'): 1, ('<UNK>', u'reports'): 1, (u'an', '<UNK>'): 1, (u'election', u'was'): 1, ('<UNK>', u'primary'): 2, (u'that', '<UNK>'): 1, (u'that', u'the'): 1, (u'and', u'Fulton'): 1, ('<UNK>', u'to'): 1, (u'primary', u'election'): 1, (u'had', u'been'): 1, (u'primary', u'which'): 1, (u'The', '<UNK>'): 1, (u'the', u'election'): 2, (u'irregularities', u'took'): 1, (u',', u'``'): 1, ('<UNK>', u'that'): 1, ('<UNK>', u'of'): 2, (u'the', u'City'): 2, (u'in', u'which'): 1, (u'jury', '<UNK>'): 1, ('<UNK>', u'.'): 2, ('<UNK>', u'the'): 1, (u'of', u"Atlanta's"): 1, ('<UNK>', u'jury'): 1, (u'had', '<UNK>'): 1, (u'election', '<UNK>'): 1, (u'Fulton', u'County'): 1, ('<UNK>', u'``'): 2, (u'of', '<UNK>'): 1, ('<UNK>', u'said'): 2, (u'place', u'.'): 1, ('<UNK>', u'and'): 1, (u'election', u','): 1, (u"Atlanta's", '<UNK>'): 1, (u'which', u'the'): 1, (u'been', '<UNK>'): 1, (u'charge', u'of'): 1, (u'County', '<UNK>'): 1, (u'by', u'Fulton'): 1, (u'reports', u'of'): 1, (u'manner', u'in'): 1, ('<UNK>', u'an'): 1, (u"''", u'in'): 1, (u'the', '<UNK>'): 2, (u'said', '<UNK>'): 1, (u'Fulton', '<UNK>'): 1, (u'The', u'jury'): 1, (u'Atlanta', u"''"): 1, (u'``', u'irregularities'): 1, (u'in', u'the'): 1, (u'took', u'place'): 1, (u'for', u'the'): 1, (u'irregularities', u"''"): 1, ('<S>', u'The'): 3, (u"''", u'that'): 1, (u'City', '<UNK>'): 1, (u'which', u'was'): 1, (u"''", u'for'): 1, (u'was', '<UNK>'): 2, (u'jury', u'had'): 1, (u'said', u'in'): 1, (u'by', '<UNK>'): 1, ('<UNK>', u"''"): 1, ('<UNK>', u'irregularities'): 1, (u'to', '<UNK>'): 1, (u'.', '</S>'): 3, (u'of', u'Atlanta'): 1, ('<UNK>', u','): 1, (u'City', u'of'): 1, (u'and', '<UNK>'): 1, (u'which', u'had'): 1, (u'the', u'manner'): 1, ('<UNK>', '<UNK>'): 12}

I want to return a new dictionary that has the sum of all the values for all the "distinct first word" in the "key," and the second word could be any word. Example: In bigram_counts above, there are 4 elements that has " u'of' " as their "first word" in key and sum of their values is 5.

I also have another dictionary that has all the "distinct words" to help in calculation. Example:

unigram_counts = {u'and': 2, u'City': 2, u"Atlanta's": 1, u'primary': 2, u'an': 1, u"''": 3, u'election': 3, u'in': 3, '<UNK>': 35, u'said': 2, u'for': 1, u'had': 2, u',': 2, u'been': 1, u'.': 3, u'to': 1, u'charge': 1, u'which': 3, u'Atlanta': 1, u'was': 2, u'``': 3, u'jury': 2, u'that': 2, '<S>': 4, u'took': 1, u'The': 3, u'by': 2, u'Fulton': 2, u'of': 5, u'reports': 1, u'irregularities': 2, u'County': 1, u'place': 1, u'the': 7, '</S>': 1, u'manner': 1}

Actually, unigram_counts already has the sum I want. However, I need to compute the sum from bigram_counts and match it against the values of unigram_counts.

Thanks

Answer 1

@Baruchel is right that this is probably bad structure to handle this, but anyway:

unigram_counts = collections.defaultdict(int)
for (first, _), val in bigram_counts.iteritems():
    unigram_counts[first] += val

It seems to work

Answer 2

It looks like the method you chose for storing your data is not the best for this task. You can achieve it, of course, but maybe another storage system would be more convenient. However, if you really have to do it this way, you should try something like:

unigram_counts = {}
for e in bigram_counts:
    if e[0] in unigram_counts:
        unigram_counts[e[0]] += bigram_counts[e[0]]
    else:
        unigram_counts[e[0]] = bigram_counts[e[0]]

(not tested, but that is the idea).

Python dictionary: Sum of all the values for all the “distinct first word” in the “key.”

Question

2 answers

solution1
1 ACCPTED 2016-10-20 14:14:37

solution2
0 2016-10-20 14:11:06

Python dictionary: Sum of all the values for all the “distinct first word” in the “key.”

Question

2 answers

solution1 1 ACCPTED 2016-10-20 14:14:37

solution2 0 2016-10-20 14:11:06

solution1
1 ACCPTED 2016-10-20 14:14:37

solution2
0 2016-10-20 14:11:06