简体   繁体   中英

Python treat words with commas the same as those without in a dictionary

I am making a program, that reads a file and makes a dictionary, that shows how many times a word has been used:

filename = 'for_python.txt'
with open(filename) as file:
    contents = file.read().split()
dict = {}
for word in contents:
    if word not in dict:
        dict[word] = 1
    else:
        dict[word] += 1
    
dict = sorted(dict.items(), key=lambda x: x[1], reverse=True)

for i in dict:
    print(i[0], i[1])

It works, but it treats words that have commas in them as different words, which I do not want it to do. Is there a simple and efficient way to do this?

Remove all commas before splitting them

filename = 'for_python.txt'
with open(filename) as file:
    contents = file.read().replace(",", "").split()

I'd suggest you strip() with the different punctuation chars when using the word . Also don't use builtin dict name, its the dictionnary constructor

import string
words = {}
for word in contents:
    word = word.strip(string.punctuation)
    if word not in words:
        words[word] = 1
    else:
        words[word] += 1

For you know, it exists collections.Counter that does this jobs

import string
from collections import Counter

filename = 'test.txt'
with open(filename) as file:
    contents = file.read().split()

words = Counter(word.strip(string.punctuation) for word in contents)

for k, v in words.most_common(): # All content, in occurence conut order descreasingly
    print(k, v)
for k, v in words.most_common(5): # Only 5 most occurrence
    print(k, v)

You are splitting the whole data based on " " as the delimiter but not doing the same for commas. You can do so by splitting the words further using commas. Here's how:

...
for word in contents:
    new_words = word.split(',')
    for new_word in new_words:
        if new_word not in dict:
            dict[new_word] = 1
        else:
            dict[new_word] += 1
...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM