简体   繁体   English

如何从句子中计算相同的单词?

[英]How can I count same words from sentences?

I want to ask how to count the same words from sentences(In Python).我想问如何从句子中计算相同的单词(在 Python 中)。

As for an example, from a sentence like: "What a wonderful day. Birds are singing, children are laughing."举个例子,像这样的句子:“多么美好的一天。鸟儿在歌唱,孩子们在笑。”

What I want to extract is: ['what':1, 'a':1, 'wonderful':1, 'dat':1, 'birds':1, 'are':2, 'singing':1, 'children':1, 'laughing':1]我要提取的是: ['what':1, 'a':1, 'wonderful':1, 'dat':1, 'birds':1, 'are':2, 'singing':1, “孩子”:1,“笑”:1]

I have made here:我在这里做了:

sent = "What a wonderful day. Birds are singing, children are laughing."
b = set([word.lower() for word in a])
c = list(b)

If this code isn't appropriate for the job, please let me know.如果此代码不适合该工作,请告诉我。 Thank you.谢谢你。

Use collections.Counter + string.strip to strip punctuations:使用collections.Counter + string.strip去除标点符号:

from collections import Counter
import string

sent = "What a wonderful day. Birds are singing, children are laughing."

c = Counter([x.strip(string.punctuation) for x in sent.split()])
print(c)

# Counter({'are': 2, 'What': 1, 'a': 1, 'wonderful': 1, 'day': 1, 'Birds': 1, 'singing': 1, 'children': 1, 'laughing': 1})

If you want this to be case insensitive, transform to lowercase before finding the count, like below:如果您希望它不区分大小写,请在查找计数之前转换为小写,如下所示:

s = sent.lower().translate(str.maketrans('', '', string.punctuation))

You can use counter and re for this您可以为此使用counter和 re

import re
from collections import Counter
remove_punctutation = re.findall("[A-Za-z]+",sent)
print(dict(Counter(remove_punctutation)))
#{'What': 1,'a': 1,'wonderful': 1,'day': 1,'Birds': 1,'are': 2,'singing': 1,'children': 1,'laughing': 1}

collections.Counter can be used to count occurences of anything in a list. collections.Counter可用于计算列表中任何内容的出现次数。 That is a good start.这是一个好的开始。 That means, however that we should first make the sentence into a list of words and remove punctuation.然而,这意味着我们应该首先将句子变成单词列表并删除标点符号。

To make a list of the words, there is a method called .split() which will split the sentence on white spaces.要制作单词列表,有一个名为.split()的方法,它将在空格上分割句子。 And to remove punctuation, the methos .strip() is a good choice.要删除标点符号,方法.strip()是一个不错的选择。

As you already hint at, we should also normalize the case.正如您已经暗示的那样,我们还应该规范化案例。 For this, it is better to use .casefold() rather than .lower() .为此,最好使用.casefold()而不是.lower() In some locals these will not be identical.在某些当地人中,这些将不相同。

All-in-all that leads to code looking somewhat like:总而言之,这导致代码看起来有点像:

import string
from collections import Counter

sent = "What a wonderful day. Birds are singing, children are laughing."
words = [word.strip(string.punctuation).casefold() for word in sent.split()]
freq = Counter(words)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在python中打印两个句子中相同的单词? - How do I print words that are the same in two sentences in python? 如何通过从另一列中的句子中提取单词来在 pandas 数据框中创建一个新列? - How can I create a new column in a pandas data frame by extracting words from sentences in another column? Python:如何使用正则表达式将句子拆分为新行,然后使用空格将标点符号与单词分开? - Python: How can I use a regex to split sentences to new lines, and then separate punctuation from words using whitespace? 如何从句子列表中创建单词列表? - How do I create a list of words from a list of sentences? 如何从文件中的单词列表创建句子 - How to create sentences from a list of words in a file 如何从数据框中的单个单词形成句子? - How to form sentences from single words in a dataframe? 如何从某些单词中去除句点而不是句子中的句点? - How to strip periods from certain words but not the of sentences? 从 Dataframe Pandas 中的句子中计算最常见的 100 个单词 - Count most frequent 100 words from sentences in Dataframe Pandas 从 Python 中的句子集中查找最常见单词的相对计数 - Find relative count of most common words from set of sentences in Python 如果同一说话者连续说出句子,如何合并句子 - How can I merge sentences if they are spoken by the same speaker consecutively
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM