简体   繁体   中英

Read words from file into dictionary

so in our assignment my professor would like us to read in a text file line by line, then word by word, then create a dictionary counting the frequency of each word appearing. Here's what I have for now:

wordcount = {}
with open('/Users/user/Desktop/Text.txt', 'r', encoding='utf-8') as f:
    for line in f:
        for word in line.split():
            line = line.lower()
            word = word.strip(string.punctuation + string.digits)
            if word:
                wordcount[word] = line.count(word)
    return wordcount

What happens is that my dictionary tells me how many of each word appears in a particular line, leaving me with mostly 1s when some words show up in the entire text many times. How can I get my dictionary to count words from the entire text, not just a line?

The problem is you are resetting it every time, the fix is quite simple:

wordcount = {}
with open('/Users/user/Desktop/Text.txt', 'r', encoding='utf-8') as f:
    for line in f:
        for word in line.split():
            line = line.lower()
            word = word.strip(string.punctuation + string.digits)
            if word:
                if word in wordcount:
                    wordcount[word] += line.count(word)
                else:
                    wordcount[word] = line.count(word)
    return wordcount

The problem is in this line:

wordcount[word] = line.count(word)

Every time that line executes, whatever the value of wordcount[word] was is getting replaced by line.count(word) when you want it to be added . Try changing it to:

wordcount[word] = wordcount[word] + line.count(word)

This is how I would do it:

import string

wordcount = {}
with open('test.txt', 'r') as f:
    for line in f:
        line = line.lower() #I suppose you want boy and Boy to be the same word
        for word in line.split():
            #what if your word has funky punctuations chars next to it?
            word = word.translate(string.maketrans("",""), string.punctuation)
            #if it's already in the d increase the number
            try:
                wordcount[word] += 1
            #if it's not this is the first time we are adding it
            except:
                wordcount[word] = 1

print wordcount

Good luck!

In case you want to see another way to do this. It's not exactly line by line and word by word as you have requested, but you should be aware of the collections module which could be very useful sometimes.

from collections import Counter
# instantiate a counter element
c = Counter()
with open('myfile.txt', 'r') as f:
     for line in f:
         # Do all the cleaning you need here 
         c.update(line.lower().split())

# Get all the statistic you want, for example:
c.most_common(10)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM