[英]Python: Nested dictionary - create if key doesn't exist, else sum 1
場景
我正在嘗試計算一個單詞在句子中出現的次數,以獲得句子列表。 每個句子都是一個單詞列表。
我希望最終的字典為整個語料庫中的每個單詞都有一個鍵,第二個鍵指示它們出現的句子,值是它出現在其中的次數。
當前解決方案
以下代碼正常工作:
dfm = dict()
for i,sentence in enumerate(setences):
for word in sentence:
if word not in df.keys():
dfm[word] = dict()
if i not in dfm[word].keys():
dfm[word][i] = 1
else:
dfm[word][i] += 1
問題
有沒有更清潔的方法可以用 python 做到這一點?
我已經經歷了這個和他們建議使用的地方:
dic.setdefault(key,[]).append(value)
和,
d = defaultdict(lambda: defaultdict(dict))
我認為它們是很好的解決方案,但我不知道如何使其適應我的特定解決方案。
謝謝 !
假設你有這個輸入:
sentences = [['dog','is','big'],['cat', 'is', 'big'], ['cat', 'is', 'dark']]
您的解決方案:
dfm = dict()
for i,sentence in enumerate(sentences):
for word in sentence:
if word not in dfm.keys():
dfm[word] = dict()
if i not in dfm[word].keys():
dfm[word][i] = 1
else:
dfm[word][i] += 1
默認字典 int:
from collections import defaultdict
dfm2 = defaultdict(lambda: defaultdict(int))
for i,sentence in enumerate(sentences):
for word in sentence:
dfm2[word][i] += 1
測試:
dfm2 == dfm # True
#{'dog': {0: 1},
# 'is': {0: 1, 1: 1, 2: 1},
# 'big': {0: 1, 1: 1},
# 'cat': {1: 1, 2: 1},
# 'dark': {2: 1}}
對於更清潔的版本,請使用Counter
from collections import Counter
string = 'this is america this is america'
x=Counter(string.split())
print(x)
output
Counter({'this': 2, 'is': 2, 'america': 2})
如果想要一些自己的代碼然后
從@rassar 復制輸入數據(句子)
def func(list_:list):
dic = {}
for sub_list in list_:
for word in sub_list:
if word not in dic.keys():
dic.update({word:1})
else:
dic[word]+=1
return dic
sentences = [['dog','is','big'],['cat', 'is', 'big'], ['cat', 'is', 'dark']]
print(func(sentences))
output
{'dog': 1, 'is': 3, 'big': 2, 'cat': 2, 'dark': 1}
from collections import Counter
sentences = ["This is Day", "Never say die", "Chat is a good bot", "Hello World", "Two plus two equals four","A quick brown fox jumps over the lazy dog", "Young chef, bring whisky with fifteen hydrogen ice cubes"]
sentenceWords = ( Counter(x.lower() for x in sentence.split()) for sentence in sentences)
#print result
print("\n".join(str(c) for c in sentenceWords))
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.