[英]Counting occurrences of a word in chunks in python (list comprehension)
I am very very new to programming so my apologies if this is going to be too dumb.我对编程非常陌生,所以如果这太愚蠢了,我很抱歉。
I am trying to count all the occurrences of a word by chunks and then I need to plot those results.我试图按块计算一个单词的所有出现,然后我需要 plot 这些结果。 My text is Pride and Prejudice and I am trying to find how frequent is the name
'Mr.Darcy'
by chunks of 3000 words.我的文字是《傲慢与偏见》,我试图通过 3000 个单词来找出
'Mr.Darcy'
这个名字的频率。 So I've trying the next unsuccessfully.所以我尝试了下一个不成功的。
x = [chunk.count('Mr. Darcy') for chunk in partition(100000, text1_pride)]
Any one can help?任何人都可以帮忙吗? Thanks a lot.
非常感谢。
As stated in the comments before, "Mr. Darcy" would be counted as 2 words, if you separate by spaces.如前所述,“达西先生”如果用空格分隔,将被计为 2 个单词。 If you want to look for just "Darcy", you could be doing something like this, if your string is called
text1_pride
如果您只想查找“Darcy”,如果您的字符串称为
text1_pride
,您可能会这样做
words = text1_pride.split()
chunks = [words[x:x+3000] for x in range(0, len(words), 3000)]
darcy_counts = [chunk.count('Darcy') for chunk in chunks]
This could all be done in one line, with nested list comprehensions.这一切都可以用嵌套列表推导式在一行中完成。
A minimal version of what you want to do based on random data would be:您想要基于随机数据执行的操作的最小版本是:
import random
import loremipsum
text = ' '.join(loremipsum.get_sentences(400)).split() # split into words
# where to replace part with Mr. Darcy
where = [random.randint(1, len(text) - 1) for _ in range(1000)]
for p in where:
text[p] = "Mr. Darcy"
text = ' '.join(text)
chunk_size = 100
# check for chunk_size list elements (some containing "Mr. Darcy" - most not)
# joins each chunk into a text then looks for Mr. Darcy
x = [' '.join(chunk).count('Mr. Darcy') for chunk in (
text[i: i + chunk_size] for i in range(0, len(text), chunk_size))]
print(x)
Output: Output:
[34, 28, 28, 34, 35, 22, 25, 31, 26, 32, 23, 21, 37, 32, 29, 40, 30,
28, 40, 29, 35, 31, 25, 34, 28, 31, 32, 11]
You would need to do你需要做
with open("yourfile.txt") as f:
text = f.read().split()
chunk_size = 3000
chunks = [ ' '.join(text[i: i + chunk_size]) for i in range(0, len(text), chunk_size))]
and then count for each chunk in chunks.然后按块计算每个块。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.