[英]How can I count same words from sentences?
I want to ask how to count the same words from sentences(In Python).我想问如何从句子中计算相同的单词(在 Python 中)。
As for an example, from a sentence like: "What a wonderful day. Birds are singing, children are laughing."举个例子,像这样的句子:“多么美好的一天。鸟儿在歌唱,孩子们在笑。”
What I want to extract is: ['what':1, 'a':1, 'wonderful':1, 'dat':1, 'birds':1, 'are':2, 'singing':1, 'children':1, 'laughing':1]我要提取的是: ['what':1, 'a':1, 'wonderful':1, 'dat':1, 'birds':1, 'are':2, 'singing':1, “孩子”:1,“笑”:1]
I have made here:我在这里做了:
sent = "What a wonderful day. Birds are singing, children are laughing."
b = set([word.lower() for word in a])
c = list(b)
If this code isn't appropriate for the job, please let me know.如果此代码不适合该工作,请告诉我。 Thank you.
谢谢你。
Use collections.Counter
+ string.strip
to strip punctuations:使用
collections.Counter
+ string.strip
去除标点符号:
from collections import Counter
import string
sent = "What a wonderful day. Birds are singing, children are laughing."
c = Counter([x.strip(string.punctuation) for x in sent.split()])
print(c)
# Counter({'are': 2, 'What': 1, 'a': 1, 'wonderful': 1, 'day': 1, 'Birds': 1, 'singing': 1, 'children': 1, 'laughing': 1})
If you want this to be case insensitive, transform to lowercase before finding the count, like below:如果您希望它不区分大小写,请在查找计数之前转换为小写,如下所示:
s = sent
.lower().translate(str.maketrans('', '', string.punctuation))
collections.Counter
can be used to count occurences of anything in a list. collections.Counter
可用于计算列表中任何内容的出现次数。 That is a good start.这是一个好的开始。 That means, however that we should first make the sentence into a list of words and remove punctuation.
然而,这意味着我们应该首先将句子变成单词列表并删除标点符号。
To make a list of the words, there is a method called .split()
which will split the sentence on white spaces.要制作单词列表,有一个名为
.split()
的方法,它将在空格上分割句子。 And to remove punctuation, the methos .strip()
is a good choice.要删除标点符号,方法
.strip()
是一个不错的选择。
As you already hint at, we should also normalize the case.正如您已经暗示的那样,我们还应该规范化案例。 For this, it is better to use
.casefold()
rather than .lower()
.为此,最好使用
.casefold()
而不是.lower()
。 In some locals these will not be identical.在某些当地人中,这些将不相同。
All-in-all that leads to code looking somewhat like:总而言之,这导致代码看起来有点像:
import string
from collections import Counter
sent = "What a wonderful day. Birds are singing, children are laughing."
words = [word.strip(string.punctuation).casefold() for word in sent.split()]
freq = Counter(words)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.