如何从句子中计算相同的单词？

Question

I want to ask how to count the same words from sentences(In Python).我想问如何从句子中计算相同的单词（在 Python 中）。

As for an example, from a sentence like: "What a wonderful day. Birds are singing, children are laughing."举个例子，像这样的句子：“多么美好的一天。鸟儿在歌唱，孩子们在笑。”

What I want to extract is: ['what':1, 'a':1, 'wonderful':1, 'dat':1, 'birds':1, 'are':2, 'singing':1, 'children':1, 'laughing':1]我要提取的是： ['what':1, 'a':1, 'wonderful':1, 'dat':1, 'birds':1, 'are':2, 'singing':1, “孩子”：1，“笑”：1]

I have made here:我在这里做了：

sent = "What a wonderful day. Birds are singing, children are laughing."
b = set([word.lower() for word in a])
c = list(b)

If this code isn't appropriate for the job, please let me know.如果此代码不适合该工作，请告诉我。 Thank you.谢谢你。

Answer 1

Use collections.Counter + string.strip to strip punctuations:使用collections.Counter + string.strip去除标点符号：

from collections import Counter
import string

sent = "What a wonderful day. Birds are singing, children are laughing."

c = Counter([x.strip(string.punctuation) for x in sent.split()])
print(c)

# Counter({'are': 2, 'What': 1, 'a': 1, 'wonderful': 1, 'day': 1, 'Birds': 1, 'singing': 1, 'children': 1, 'laughing': 1})

If you want this to be case insensitive, transform to lowercase before finding the count, like below:如果您希望它不区分大小写，请在查找计数之前转换为小写，如下所示：

s = sent.lower().translate(str.maketrans('', '', string.punctuation))

Answer 2

You can use counter and re for this您可以为此使用counter和 re

import re
from collections import Counter
remove_punctutation = re.findall("[A-Za-z]+",sent)
print(dict(Counter(remove_punctutation)))
#{'What': 1,'a': 1,'wonderful': 1,'day': 1,'Birds': 1,'are': 2,'singing': 1,'children': 1,'laughing': 1}

Answer 3

collections.Counter can be used to count occurences of anything in a list. collections.Counter可用于计算列表中任何内容的出现次数。 That is a good start.这是一个好的开始。 That means, however that we should first make the sentence into a list of words and remove punctuation.然而，这意味着我们应该首先将句子变成单词列表并删除标点符号。

To make a list of the words, there is a method called .split() which will split the sentence on white spaces.要制作单词列表，有一个名为.split()的方法，它将在空格上分割句子。 And to remove punctuation, the methos .strip() is a good choice.要删除标点符号，方法.strip()是一个不错的选择。

As you already hint at, we should also normalize the case.正如您已经暗示的那样，我们还应该规范化案例。 For this, it is better to use .casefold() rather than .lower() .为此，最好使用.casefold()而不是.lower() 。 In some locals these will not be identical.在某些当地人中，这些将不相同。

All-in-all that leads to code looking somewhat like:总而言之，这导致代码看起来有点像：

import string
from collections import Counter

sent = "What a wonderful day. Birds are singing, children are laughing."
words = [word.strip(string.punctuation).casefold() for word in sent.split()]
freq = Counter(words)

如何从句子中计算相同的单词？

问题描述

3 个解决方案

解决方案1
0 2020-06-06 04:44:36

解决方案2
0 2020-06-06 04:53:46

解决方案3
0 2020-06-06 04:55:31

如何从句子中计算相同的单词？

问题描述

3 个解决方案

解决方案1 0 2020-06-06 04:44:36

解决方案2 0 2020-06-06 04:53:46

解决方案3 0 2020-06-06 04:55:31

解决方案1
0 2020-06-06 04:44:36

解决方案2
0 2020-06-06 04:53:46

解决方案3
0 2020-06-06 04:55:31