I am making a program, that reads a file and makes a dictionary, that shows how many times a word has been used:
filename = 'for_python.txt'
with open(filename) as file:
contents = file.read().split()
dict = {}
for word in contents:
if word not in dict:
dict[word] = 1
else:
dict[word] += 1
dict = sorted(dict.items(), key=lambda x: x[1], reverse=True)
for i in dict:
print(i[0], i[1])
It works, but it treats words that have commas in them as different words, which I do not want it to do. Is there a simple and efficient way to do this?
Remove all commas before splitting them
filename = 'for_python.txt'
with open(filename) as file:
contents = file.read().replace(",", "").split()
I'd suggest you strip()
with the different punctuation chars when using the word
. Also don't use builtin dict
name, its the dictionnary constructor
import string
words = {}
for word in contents:
word = word.strip(string.punctuation)
if word not in words:
words[word] = 1
else:
words[word] += 1
For you know, it exists collections.Counter
that does this jobs
import string
from collections import Counter
filename = 'test.txt'
with open(filename) as file:
contents = file.read().split()
words = Counter(word.strip(string.punctuation) for word in contents)
for k, v in words.most_common(): # All content, in occurence conut order descreasingly
print(k, v)
for k, v in words.most_common(5): # Only 5 most occurrence
print(k, v)
You are splitting the whole data based on " "
as the delimiter but not doing the same for commas. You can do so by splitting the words further using commas. Here's how:
...
for word in contents:
new_words = word.split(',')
for new_word in new_words:
if new_word not in dict:
dict[new_word] = 1
else:
dict[new_word] += 1
...
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.