將單詞從文件讀入字典

Question

因此，在我們的作業中，我的教授希望我們逐行閱讀一個文本文件，然后逐個單詞閱讀，然后創建一個字典，計算每個單詞出現的頻率。 這是我現在擁有的：

wordcount = {}
with open('/Users/user/Desktop/Text.txt', 'r', encoding='utf-8') as f:
    for line in f:
        for word in line.split():
            line = line.lower()
            word = word.strip(string.punctuation + string.digits)
            if word:
                wordcount[word] = line.count(word)
    return wordcount

發生的是，我的字典告訴我每個單詞在特定行中出現了多少，而當某些單詞在整個文本中多次出現時，我幾乎只剩下1。 我如何才能使字典來計算整個文本中的單詞，而不僅僅是一行？

Answer 1

問題是您每次都要重置它，此修復非常簡單：

wordcount = {}
with open('/Users/user/Desktop/Text.txt', 'r', encoding='utf-8') as f:
    for line in f:
        for word in line.split():
            line = line.lower()
            word = word.strip(string.punctuation + string.digits)
            if word:
                if word in wordcount:
                    wordcount[word] += line.count(word)
                else:
                    wordcount[word] = line.count(word)
    return wordcount

Answer 2

問題在這一行：

wordcount[word] = line.count(word)

每次執行該行時，當您希望添加時，無論wordcount[word]的值是什么，都將被line.count(word) 替換。 嘗試將其更改為：

wordcount[word] = wordcount[word] + line.count(word)

Answer 3

這就是我要做的：

import string

wordcount = {}
with open('test.txt', 'r') as f:
    for line in f:
        line = line.lower() #I suppose you want boy and Boy to be the same word
        for word in line.split():
            #what if your word has funky punctuations chars next to it?
            word = word.translate(string.maketrans("",""), string.punctuation)
            #if it's already in the d increase the number
            try:
                wordcount[word] += 1
            #if it's not this is the first time we are adding it
            except:
                wordcount[word] = 1

print wordcount

祝好運！

Answer 4

如果您想查看另一種方法。 它並不是按照您的要求逐行和逐字逐句地進行的，但是您應該意識到collections模塊有時會非常有用。

from collections import Counter
# instantiate a counter element
c = Counter()
with open('myfile.txt', 'r') as f:
     for line in f:
         # Do all the cleaning you need here 
         c.update(line.lower().split())

# Get all the statistic you want, for example:
c.most_common(10)

將單詞從文件讀入字典

問題描述

4 個解決方案

解決方案1
3 已采納 2015-06-21 23:49:47

解決方案2
1 2015-06-21 23:50:08

解決方案3
1 2015-06-22 00:28:26

解決方案4
0 2015-06-23 19:57:01

將單詞從文件讀入字典

問題描述

4 個解決方案

解決方案1 3 已采納 2015-06-21 23:49:47

解決方案2 1 2015-06-21 23:50:08

解決方案3 1 2015-06-22 00:28:26

解決方案4 0 2015-06-23 19:57:01

解決方案1
3 已采納 2015-06-21 23:49:47

解決方案2
1 2015-06-21 23:50:08

解決方案3
1 2015-06-22 00:28:26

解決方案4
0 2015-06-23 19:57:01