理解字典計數器和重構python代碼

Question

我正在自學 Python，我開始重構 Python 代碼以學習新的高效編碼方法。

我試圖為word_dict做一個理解詞典，但我沒有找到辦法。 我有兩個問題：

我嘗試使用word_dict[word]:=word_dict[word]+1在我的理解詞典中添加word_dict[word] += 1
我想使用if word not in word_dict來檢查該元素是否已經在理解字典（我正在創建）中並且它不起作用。

理解詞典是：

word_dict = {word_dict[word]:= 0 if word not in word_dict else word_dict[word]:= word_dict[word] + 1 for word in text_split}

這是代碼，它讀取文本並計算其中的不同單詞。 如果您知道更好的方法，請告訴我。

text = "hello Hello, water! WATER:HELLO. water , HELLO"

# clean then text
text_cleaned = re.sub(r':|!|,|\.', " ", text)
# Output 'hello Hello  water  WATER HELLO  water   HELLO'

# creates list without spaces elements
text_split = [element for element in text_cleaned.split(' ') if element != '']
# Output ['hello', 'Hello', 'water', 'WATER', 'HELLO', 'water', 'HELLO']

word_dict = {}

for word in text_split:
    if word not in word_dict:
        word_dict[word] = 0 
    word_dict[word] += 1

word_dict
# Output {'hello': 1, 'Hello': 1, 'water': 2, 'WATER': 1, 'HELLO': 2}

Answer 1

歡迎來到 Python。 There is the library collections ( https://docs.python.org/3/library/collections.html ), which has a class called Counter. 這似乎很可能適合您的代碼。 這是拿來的嗎？

from collections import Counter
...
word_dict = Counter(text_split)

Answer 2

現在，您正在使用正則表達式刪除一些不需要的字符，然后在空格上拆分以獲取單詞列表。 為什么不使用正則表達式來立即獲取單詞？ 您還可以利用collections.Counter創建一個字典，其中鍵是單詞，關聯的值是計數/出現次數：

import re
from collections import Counter

text = "hello Hello, water! WATER:HELLO. water , HELLO"

pattern = r"\b\w+\b"

print(Counter(re.findall(pattern, text)))

Output：

Counter({'water': 2, 'HELLO': 2, 'hello': 1, 'Hello': 1, 'WATER': 1})
>>>

以下是正則表達式模式的組成：

\b - 表示單詞邊界（不會包含在匹配中）
\w+ - 來自[a-zA-Z0-9_]集合的一個或多個字符。
\b - 另一個單詞邊界（也不會包含在匹配中）

理解字典計數器和重構python代碼

問題描述

2 個解決方案

解決方案1
0 2021-04-21 12:49:59

解決方案2
0 已采納 2021-04-21 12:50:04

理解字典計數器和重構python代碼

問題描述

2 個解決方案

解決方案1 0 2021-04-21 12:49:59

解決方案2 0 已采納 2021-04-21 12:50:04

解決方案1
0 2021-04-21 12:49:59

解決方案2
0 已采納 2021-04-21 12:50:04