Python字典：“鍵”中所有“不同的第一個單詞”的所有值的總和。

Question

我有一本字典，其中包含兩個單詞的組合，分別是“鍵”和某些數字作為“值”。 例：

bigram_counts = {(u',', u'which'): 1, (u'of', u'the'): 2, ('<UNK>', u'by'): 2, (u'in', '<UNK>'): 1, ('<UNK>', u'charge'): 1, (u'``', '<UNK>'): 2, (u'The', u'and'): 1, ('<UNK>', u'reports'): 1, (u'an', '<UNK>'): 1, (u'election', u'was'): 1, ('<UNK>', u'primary'): 2, (u'that', '<UNK>'): 1, (u'that', u'the'): 1, (u'and', u'Fulton'): 1, ('<UNK>', u'to'): 1, (u'primary', u'election'): 1, (u'had', u'been'): 1, (u'primary', u'which'): 1, (u'The', '<UNK>'): 1, (u'the', u'election'): 2, (u'irregularities', u'took'): 1, (u',', u'``'): 1, ('<UNK>', u'that'): 1, ('<UNK>', u'of'): 2, (u'the', u'City'): 2, (u'in', u'which'): 1, (u'jury', '<UNK>'): 1, ('<UNK>', u'.'): 2, ('<UNK>', u'the'): 1, (u'of', u"Atlanta's"): 1, ('<UNK>', u'jury'): 1, (u'had', '<UNK>'): 1, (u'election', '<UNK>'): 1, (u'Fulton', u'County'): 1, ('<UNK>', u'``'): 2, (u'of', '<UNK>'): 1, ('<UNK>', u'said'): 2, (u'place', u'.'): 1, ('<UNK>', u'and'): 1, (u'election', u','): 1, (u"Atlanta's", '<UNK>'): 1, (u'which', u'the'): 1, (u'been', '<UNK>'): 1, (u'charge', u'of'): 1, (u'County', '<UNK>'): 1, (u'by', u'Fulton'): 1, (u'reports', u'of'): 1, (u'manner', u'in'): 1, ('<UNK>', u'an'): 1, (u"''", u'in'): 1, (u'the', '<UNK>'): 2, (u'said', '<UNK>'): 1, (u'Fulton', '<UNK>'): 1, (u'The', u'jury'): 1, (u'Atlanta', u"''"): 1, (u'``', u'irregularities'): 1, (u'in', u'the'): 1, (u'took', u'place'): 1, (u'for', u'the'): 1, (u'irregularities', u"''"): 1, ('<S>', u'The'): 3, (u"''", u'that'): 1, (u'City', '<UNK>'): 1, (u'which', u'was'): 1, (u"''", u'for'): 1, (u'was', '<UNK>'): 2, (u'jury', u'had'): 1, (u'said', u'in'): 1, (u'by', '<UNK>'): 1, ('<UNK>', u"''"): 1, ('<UNK>', u'irregularities'): 1, (u'to', '<UNK>'): 1, (u'.', '</S>'): 3, (u'of', u'Atlanta'): 1, ('<UNK>', u','): 1, (u'City', u'of'): 1, (u'and', '<UNK>'): 1, (u'which', u'had'): 1, (u'the', u'manner'): 1, ('<UNK>', '<UNK>'): 12}

我想返回一個新字典，該字典具有“鍵”中所有“不同的第一個單詞”的所有值的總和，第二個單詞可以是任何單詞。 示例：在上述bigram_counts中，有4個元素的鍵中的第一個單詞為“ u'of'”，其總和為5。

我還有另一本字典，其中包含所有“不同的詞”以幫助計算。 例：

unigram_counts = {u'and': 2, u'City': 2, u"Atlanta's": 1, u'primary': 2, u'an': 1, u"''": 3, u'election': 3, u'in': 3, '<UNK>': 35, u'said': 2, u'for': 1, u'had': 2, u',': 2, u'been': 1, u'.': 3, u'to': 1, u'charge': 1, u'which': 3, u'Atlanta': 1, u'was': 2, u'``': 3, u'jury': 2, u'that': 2, '<S>': 4, u'took': 1, u'The': 3, u'by': 2, u'Fulton': 2, u'of': 5, u'reports': 1, u'irregularities': 2, u'County': 1, u'place': 1, u'the': 7, '</S>': 1, u'manner': 1}

實際上，unigram_counts已經有了我想要的總和。 但是，我需要從bigram_counts計算總和並將其與unigram_counts的值匹配。

謝謝

Answer 1

@Baruchel是正確的，這可能是處理此問題的不良結構，但無論如何：

unigram_counts = collections.defaultdict(int)
for (first, _), val in bigram_counts.iteritems():
    unigram_counts[first] += val

似乎有效

Answer 2

看來您選擇的存儲數據的方法並不是此任務的最佳選擇。 您當然可以實現，但是也許另一個存儲系統會更方便。 但是，如果確實需要這種方式，則應嘗試以下操作：

unigram_counts = {}
for e in bigram_counts:
    if e[0] in unigram_counts:
        unigram_counts[e[0]] += bigram_counts[e[0]]
    else:
        unigram_counts[e[0]] = bigram_counts[e[0]]

（未經測試，但這就是想法）。

Python字典：“鍵”中所有“不同的第一個單詞”的所有值的總和。

問題描述

2 個解決方案

解決方案1
1 已采納 2016-10-20 14:14:37

解決方案2
0 2016-10-20 14:11:06

Python字典：“鍵”中所有“不同的第一個單詞”的所有值的總和。

問題描述

2 個解決方案

解決方案1 1 已采納 2016-10-20 14:14:37

解決方案2 0 2016-10-20 14:11:06

解決方案1
1 已采納 2016-10-20 14:14:37

解決方案2
0 2016-10-20 14:11:06