理解字典计数器和重构python代码

Question

我正在自学 Python，我开始重构 Python 代码以学习新的高效编码方法。

我试图为word_dict做一个理解词典，但我没有找到办法。 我有两个问题：

我尝试使用word_dict[word]:=word_dict[word]+1在我的理解词典中添加word_dict[word] += 1
我想使用if word not in word_dict来检查该元素是否已经在理解字典（我正在创建）中并且它不起作用。

理解词典是：

word_dict = {word_dict[word]:= 0 if word not in word_dict else word_dict[word]:= word_dict[word] + 1 for word in text_split}

这是代码，它读取文本并计算其中的不同单词。 如果您知道更好的方法，请告诉我。

text = "hello Hello, water! WATER:HELLO. water , HELLO"

# clean then text
text_cleaned = re.sub(r':|!|,|\.', " ", text)
# Output 'hello Hello  water  WATER HELLO  water   HELLO'

# creates list without spaces elements
text_split = [element for element in text_cleaned.split(' ') if element != '']
# Output ['hello', 'Hello', 'water', 'WATER', 'HELLO', 'water', 'HELLO']

word_dict = {}

for word in text_split:
    if word not in word_dict:
        word_dict[word] = 0 
    word_dict[word] += 1

word_dict
# Output {'hello': 1, 'Hello': 1, 'water': 2, 'WATER': 1, 'HELLO': 2}

Answer 1

欢迎来到 Python。 There is the library collections ( https://docs.python.org/3/library/collections.html ), which has a class called Counter. 这似乎很可能适合您的代码。 这是拿来的吗？

from collections import Counter
...
word_dict = Counter(text_split)

Answer 2

现在，您正在使用正则表达式删除一些不需要的字符，然后在空格上拆分以获取单词列表。 为什么不使用正则表达式来立即获取单词？ 您还可以利用collections.Counter创建一个字典，其中键是单词，关联的值是计数/出现次数：

import re
from collections import Counter

text = "hello Hello, water! WATER:HELLO. water , HELLO"

pattern = r"\b\w+\b"

print(Counter(re.findall(pattern, text)))

Output：

Counter({'water': 2, 'HELLO': 2, 'hello': 1, 'Hello': 1, 'WATER': 1})
>>>

以下是正则表达式模式的组成：

\b - 表示单词边界（不会包含在匹配中）
\w+ - 来自[a-zA-Z0-9_]集合的一个或多个字符。
\b - 另一个单词边界（也不会包含在匹配中）

理解字典计数器和重构python代码

问题描述

2 个解决方案

解决方案1
0 2021-04-21 12:49:59

解决方案2
0 已采纳 2021-04-21 12:50:04

理解字典计数器和重构python代码

问题描述

2 个解决方案

解决方案1 0 2021-04-21 12:49:59

解决方案2 0 已采纳 2021-04-21 12:50:04

解决方案1
0 2021-04-21 12:49:59

解决方案2
0 已采纳 2021-04-21 12:50:04