[英]How to count words in text and append in dictionary?
I'm trying to make a dictionary of frequency of words in a text but for some reason extra characters print out (I'm not sure if this is my text or if it's my code) and it doesn't successfully print out the lines or words that contain the invalid symbol! 我正在尝试制作文本中单词出现频率的字典,但由于某些原因,多余的字符会打印出来(我不确定这是我的文本还是代码),并且无法成功打印出行或包含无效符号的单词! This is the code I have: 这是我的代码:
def parse_documentation(filename):
filename=open(filename, "r")
lines = filename.read();
invalidsymbols=["`","~","!", "@","#","$"]
for line in lines:
for x in invalidsymbols:
if x in line:
print(line)
print(x)
print(line.replace(x, ""))
freq={}
for word in line:
count=counter(word)
freq[word]=count
return freq
Your code has several flaws. 您的代码有几个缺陷。 I will not solve all of them but point you in the right direction. 我不会解决所有问题,但会指出正确的方向。
Firstly, read
reads the whole file as a string. 首先, read
将整个文件读取为字符串。 I don't think that's your intention here. 我认为这不是您的意图。 Use readlines()
instead to get all lines in the file as a list. 使用readlines()
代替将文件中的所有行作为列表。
def parse_documentation(filename):
filename=open(filename, "r")
lines = filename.readlines(); # returns a list of all lines in file
invalidsymbols=["`","~","!", "@","#","$"]
freq = {} # declare this OUTSIDE of your loop.
for line in lines:
for letter in line:
if letter in invalidsymbols:
print(letter)
line = line.replace(letter, ""))
print line #this should print the line without invalid symbols.
words = line.split() # Now get the words.
for word in line:
count=counter(word)
# ... Do your counter stuff here ...
return freq
Second, I'm highly suspicious of the workings of your counter
method. 其次,我对您的counter
方法的工作方式非常怀疑。 If your intention is to count the number of words, you could adopt this strategy: 如果您打算计算字数,则可以采用以下策略:
word
is in freq
. 检查word
是否在freq
。 freq
, add it and map it to 1. Otherwise, increment the number that the word
was previously mapped to. 如果不在freq
,请将其添加并映射到1。否则,增加该word
先前映射到的数字。 This should set you on the right track. 这将使您走上正确的道路。
Check this, it might be what you want. 选中此选项,可能就是您想要的。 BTW, your code is not correct Python
code. 顺便说一句,您的代码不是正确的Python
代码。 There are many issues there. 那里有很多问题。
from collections import Counter
def parse_documentation(filename):
with open(filename,"r") as fin:
lines = fin.read()
#for sym in ["`","~","!","@","#","$"]: lines = lines.replace(sym,'')
lines = lines.translate(None,"`~!@#$") #thanks to @gnibbler's comment
freq = Counter(lines.split())
return freq
text file: 文本文件:
this is a text. text is that. @this #that
$this #!that is those
Results: 结果:
Counter({'this': 3, 'is': 3, 'that': 2, 'a': 1, 'that.': 1, 'text': 1, 'text.': 1, 'those': 1})
you might need. 您可能需要。 line.split(' ')
else the for loop will loop through letters. line.split(' ')
否则for循环将遍历字母。
....
for word in line.split(' '):
count=counter(word)
...
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.