[英]trying to create a dictionary from a text file but
so, I have text file (a paragraph) and I need to read the file and create a dictionary containing each different word from the file as a key and the corresponding value for each key will be an integer showing the frequency of the word in the text file. 因此,我有一个文本文件(一个段落),我需要阅读该文件并创建一个字典,其中包含来自文件的每个不同单词作为关键字,每个关键字的对应值将是一个整数,表示单词在单词中的出现频率文本文件。 an example of what the dictionary should look like:
字典的外观示例:
{'and':2, 'all':1, 'be':1, 'is':3}
etc. {'and':2, 'all':1, 'be':1, 'is':3}
等。
so far I have this, 到目前为止,我有这个
def create_word_frequency_dictionary () :
filename = 'dictionary.txt'
infile = open(filename, 'r')
line = infile.readline()
my_dictionary = {}
frequency = 0
while line != '' :
row = line.lower()
word_list = row.split()
print(word_list)
print (word_list[0])
words = word_list[0]
my_dictionary[words] = frequency+1
line = infile.readline()
infile.close()
print (my_dictionary)
create_word_frequency_dictionary()
any help would be appreciated thanks. 任何帮助,将不胜感激谢谢。
Documentation defines collections
module as "High-performance container datatypes". 文档将
collections
模块定义为“高性能容器数据类型”。 Consider using collections.Counter
instead of re-inventing the wheel. 考虑使用
collections.Counter
而不是重新发明轮子。
from collections import Counter
filename = 'dictionary.txt'
infile = open(filename, 'r')
text = str(infile.read())
print(Counter(text.split()))
Update: Okay, I fixed your code and now it works, but Counter is still a better option: 更新:好的,我已经修复了您的代码,现在可以了,但是Counter仍然是一个更好的选择:
def create_word_frequency_dictionary () :
filename = 'dictionary.txt'
infile = open(filename, 'r')
lines = infile.readlines()
my_dictionary = {}
for line in lines:
row = str(line.lower())
for word in row.split():
if word in my_dictionary:
my_dictionary[word] = my_dictionary[word] + 1
else:
my_dictionary[word] = 1
infile.close()
print (my_dictionary)
create_word_frequency_dictionary()
If you are not using version of python which has Counter: 如果您未使用具有Counter的python版本:
>>> import collections
>>> words = ["a", "b", "a", "c"]
>>> word_frequency = collections.defaultdict(int)
>>> for w in words:
... word_frequency[w] += 1
...
>>> print word_frequency
defaultdict(<type 'int'>, {'a': 2, 'c': 1, 'b': 1})
只需将my_dictionary[words] = frequency+1
替换为my_dictionary[words] = my_dictionary[words]+1
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.