如何在Python中創建字典詞典

Question

所以我正在學習一個自然語言處理類，我需要創建一個trigram語言模型來生成隨機文本，這些文本在一定程度上基於某些樣本數據看起來“逼真”。

根本需要創建一個“三元組”來保存各種3個字母的語法單詞組合。 我的教授暗示這可以通過使用我試圖使用的字典詞典來完成：

trigram = defaultdict( defaultdict(defaultdict(int)))

但是我收到的錯誤是：

trigram = defaultdict( dict(dict(int)))
TypeError: 'type' object is not iterable

如何創建3層嵌套字典或int值字典詞典？

如果他們不知道如何回答，我猜人們就堆棧溢出問題投票。 我將添加一些背景知識，以便為那些願意提供幫助的人更好地解釋這個問題。

該三元組用於跟蹤三字模式。 它們被用在文本語言處理軟件中，幾乎無處不在自然語言處理“思考siri或google現在”。

如果我們將3個級別的字典指定為dict1 dict2和dict3，那么解析文本文件並閱讀語句“The boy runs”將具有以下內容：

dict1有一個“the”鍵。 訪問該密鑰將返回包含密鑰“boy”的dict2。 訪問該密鑰將返回最終的dict3，其中包含現在訪問該密鑰的密鑰“runs”將返回值1。

這象征着在這篇文章中“男孩跑”已經出現過一次。 如果我們再次遇到它，那么我們將遵循相同的過程並將1增加到2。 如果我們遇到“女孩走路”，那么dict2“the”鍵字典現在將包含另一個“女孩”的鍵，它將具有一個具有“行走”鍵值和值1的dict3等等。 最終在解析了大量文本（並跟蹤單詞計數）之后，你將有一個三元組，它可以根據它們在先前解析的文本中出現的次數來確定導致3個單詞組合的某個起始單詞的可能性。。

這可以幫助您創建語法規則來識別語言，或者在我的情況下創建隨機生成的文本，看起來非常像語法英語。 我需要一個三層字典，因為在3個單詞組合的任何位置，可以有另一個單詞可以創建一組完整的不同組合。 我盡我最大的努力，盡我所能來解釋三元組及其背后的目的......我剛剛在幾周前講過這個課程。

現在......所有這一切都在說。 我如何創建一個字典詞典字典，其基本字典在python中包含int類型的值？

trigram = defaultdict（defaultdict（defaultdict（int）））

為我拋出一個錯誤

Answer 1

我之前嘗試過嵌套的defaultdict ，解決方案似乎是一個lambda調用：

trigram = defaultdict(lambda: defaultdict(lambda: defaultdict(int)))

trigram['a']['b']['c'] += 1

它不漂亮，但我懷疑嵌套字典建議是為了有效查找。

Answer 2

通常，要創建三元組的嵌套字典，已發布的解決方案可能會起作用。 如果您想擴展一個更通用的解決方案的想法，您可以執行以下操作之一，其中一個采用Perl的AutoVivification ，另一個采用collection.defaultdict 。

解決方案1：

class ngram(dict):
    """Based on perl's autovivification feature."""
    def __getitem__(self, item):
        try:
            return super(ngram, self).__getitem__(item)
        except KeyError:
            value = self[item] = type(self)()
            return value

解決方案2：

from collections import defaultdict
class ngram(defaultdict):
    def __init__(self):
        super(ngram, self).__init__(ngram)

使用解決方案1進行演示

>>> trigram = ngram()
>>> trigram['two']['three']['four'] = 4
>>> trigram
{'two': {'three': {'four': 4}}}
>>> a['two']
{'three': {'four': 4}}
>>> a['two']['three']
{'four': 4}
>>> a['two']['three']['four']
4

使用解決方案2進行演示

>>> a = ngram()
>>> a['two']['three']['four'] = 4
>>> a
defaultdict(<class '__main__.ngram'>, {'two': defaultdict(<class '__main__.ngram'>, {'three': defaultdict(<class '__main__.ngram'>, {'four': 4})})})

Answer 3

defaultdict __init__方法接受一個必須是可調用的參數。 傳遞給defaultdict的callable必須可以不帶參數調用，並且必須返回默認值的實例。

像你一樣嵌套defaultdict的問題是defaultdict的__init__接受了一個參數。 給defaultdict這個參數意味着它不是包裝defaultdict而是有一個可調用的__init__參數，它有一個defaultdict實例，它不可調用。

@pcoving的lambda解決方案將起作用，因為它創建了一個匿名函數，該函數返回一個defaultdict ，該函數使用一個函數初始化，該函數為字典嵌套中的每一層返回正確的類型defaultdict 。

Answer 4

如果它只是提取和檢索三元組，你應該嘗試使用NLTK ：

>>> import nltk
>>> sent = "this is a foo bar crazycoder"
>>> trigrams = nltk.ngrams(sent.split(), 3)
[('this', 'is', 'a'), ('is', 'a', 'foo'), ('a', 'foo', 'bar'), ('foo', 'bar', 'crazycoder')]
# token "a" in first element of trigram
>>> first_a = [i for i in trigrams if i[0] == "a"]
[('a', 'foo', 'bar')]
# token "a" in 2nd element of trigram
>>> second_a = [i for i in trigrams if i[1] == "a"]
[('is', 'a', 'foo')]
# token "a" in third element of trigram
>>> third = [i for i in trigrams if i[2] == "a"]
[('this', 'is', 'a')]
# look for 2gram in trigrams
>> two_foobar = [i for i in trigrams if "foo" in i and "bar" in i]
[('a', 'foo', 'bar'), ('foo', 'bar', 'crazycoder')]
# look for a perfect 3gram
>> perfect = [i fof i in trigrams if "foo bar crazycoder".split() == i]
[('foo', 'bar', 'crazycoder')]

如何在Python中創建字典詞典

問題描述

4 個解決方案

解決方案1
12 已采納 2013-09-28 04:08:30

解決方案2
5 2013-09-28 04:47:24

解決方案3
1 2013-09-28 05:14:27

解決方案4
0 2013-09-28 04:56:02

如何在Python中創建字典詞典

問題描述

4 個解決方案

解決方案1 12 已采納 2013-09-28 04:08:30

解決方案2 5 2013-09-28 04:47:24

解決方案3 1 2013-09-28 05:14:27

解決方案4 0 2013-09-28 04:56:02

解決方案1
12 已采納 2013-09-28 04:08:30

解決方案2
5 2013-09-28 04:47:24

解決方案3
1 2013-09-28 05:14:27

解決方案4
0 2013-09-28 04:56:02