简体   繁体   中英

Python dictionary: Sum of all the values for all the “distinct first word” in the “key.”

I have a dictionary that has a combination of two words as "keys", and certain number as its "values." Example:

bigram_counts = {(u',', u'which'): 1, (u'of', u'the'): 2, ('<UNK>', u'by'): 2, (u'in', '<UNK>'): 1, ('<UNK>', u'charge'): 1, (u'``', '<UNK>'): 2, (u'The', u'and'): 1, ('<UNK>', u'reports'): 1, (u'an', '<UNK>'): 1, (u'election', u'was'): 1, ('<UNK>', u'primary'): 2, (u'that', '<UNK>'): 1, (u'that', u'the'): 1, (u'and', u'Fulton'): 1, ('<UNK>', u'to'): 1, (u'primary', u'election'): 1, (u'had', u'been'): 1, (u'primary', u'which'): 1, (u'The', '<UNK>'): 1, (u'the', u'election'): 2, (u'irregularities', u'took'): 1, (u',', u'``'): 1, ('<UNK>', u'that'): 1, ('<UNK>', u'of'): 2, (u'the', u'City'): 2, (u'in', u'which'): 1, (u'jury', '<UNK>'): 1, ('<UNK>', u'.'): 2, ('<UNK>', u'the'): 1, (u'of', u"Atlanta's"): 1, ('<UNK>', u'jury'): 1, (u'had', '<UNK>'): 1, (u'election', '<UNK>'): 1, (u'Fulton', u'County'): 1, ('<UNK>', u'``'): 2, (u'of', '<UNK>'): 1, ('<UNK>', u'said'): 2, (u'place', u'.'): 1, ('<UNK>', u'and'): 1, (u'election', u','): 1, (u"Atlanta's", '<UNK>'): 1, (u'which', u'the'): 1, (u'been', '<UNK>'): 1, (u'charge', u'of'): 1, (u'County', '<UNK>'): 1, (u'by', u'Fulton'): 1, (u'reports', u'of'): 1, (u'manner', u'in'): 1, ('<UNK>', u'an'): 1, (u"''", u'in'): 1, (u'the', '<UNK>'): 2, (u'said', '<UNK>'): 1, (u'Fulton', '<UNK>'): 1, (u'The', u'jury'): 1, (u'Atlanta', u"''"): 1, (u'``', u'irregularities'): 1, (u'in', u'the'): 1, (u'took', u'place'): 1, (u'for', u'the'): 1, (u'irregularities', u"''"): 1, ('<S>', u'The'): 3, (u"''", u'that'): 1, (u'City', '<UNK>'): 1, (u'which', u'was'): 1, (u"''", u'for'): 1, (u'was', '<UNK>'): 2, (u'jury', u'had'): 1, (u'said', u'in'): 1, (u'by', '<UNK>'): 1, ('<UNK>', u"''"): 1, ('<UNK>', u'irregularities'): 1, (u'to', '<UNK>'): 1, (u'.', '</S>'): 3, (u'of', u'Atlanta'): 1, ('<UNK>', u','): 1, (u'City', u'of'): 1, (u'and', '<UNK>'): 1, (u'which', u'had'): 1, (u'the', u'manner'): 1, ('<UNK>', '<UNK>'): 12}

I want to return a new dictionary that has the sum of all the values for all the "distinct first word" in the "key," and the second word could be any word. Example: In bigram_counts above, there are 4 elements that has " u'of' " as their "first word" in key and sum of their values is 5.

I also have another dictionary that has all the "distinct words" to help in calculation. Example:

unigram_counts = {u'and': 2, u'City': 2, u"Atlanta's": 1, u'primary': 2, u'an': 1, u"''": 3, u'election': 3, u'in': 3, '<UNK>': 35, u'said': 2, u'for': 1, u'had': 2, u',': 2, u'been': 1, u'.': 3, u'to': 1, u'charge': 1, u'which': 3, u'Atlanta': 1, u'was': 2, u'``': 3, u'jury': 2, u'that': 2, '<S>': 4, u'took': 1, u'The': 3, u'by': 2, u'Fulton': 2, u'of': 5, u'reports': 1, u'irregularities': 2, u'County': 1, u'place': 1, u'the': 7, '</S>': 1, u'manner': 1}

Actually, unigram_counts already has the sum I want. However, I need to compute the sum from bigram_counts and match it against the values of unigram_counts.

Thanks

@Baruchel is right that this is probably bad structure to handle this, but anyway:

unigram_counts = collections.defaultdict(int)
for (first, _), val in bigram_counts.iteritems():
    unigram_counts[first] += val

It seems to work

It looks like the method you chose for storing your data is not the best for this task. You can achieve it, of course, but maybe another storage system would be more convenient. However, if you really have to do it this way, you should try something like:

unigram_counts = {}
for e in bigram_counts:
    if e[0] in unigram_counts:
        unigram_counts[e[0]] += bigram_counts[e[0]]
    else:
        unigram_counts[e[0]] = bigram_counts[e[0]]

(not tested, but that is the idea).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM