将单词从文件读入字典

Question

因此，在我们的作业中，我的教授希望我们逐行阅读一个文本文件，然后逐个单词阅读，然后创建一个字典，计算每个单词出现的频率。 这是我现在拥有的：

wordcount = {}
with open('/Users/user/Desktop/Text.txt', 'r', encoding='utf-8') as f:
    for line in f:
        for word in line.split():
            line = line.lower()
            word = word.strip(string.punctuation + string.digits)
            if word:
                wordcount[word] = line.count(word)
    return wordcount

发生的是，我的字典告诉我每个单词在特定行中出现了多少，而当某些单词在整个文本中多次出现时，我几乎只剩下1。 我如何才能使字典来计算整个文本中的单词，而不仅仅是一行？

Answer 1

问题是您每次都要重置它，此修复非常简单：

wordcount = {}
with open('/Users/user/Desktop/Text.txt', 'r', encoding='utf-8') as f:
    for line in f:
        for word in line.split():
            line = line.lower()
            word = word.strip(string.punctuation + string.digits)
            if word:
                if word in wordcount:
                    wordcount[word] += line.count(word)
                else:
                    wordcount[word] = line.count(word)
    return wordcount

Answer 2

问题在这一行：

wordcount[word] = line.count(word)

每次执行该行时，当您希望添加时，无论wordcount[word]的值是什么，都将被line.count(word) 替换。 尝试将其更改为：

wordcount[word] = wordcount[word] + line.count(word)

Answer 3

这就是我要做的：

import string

wordcount = {}
with open('test.txt', 'r') as f:
    for line in f:
        line = line.lower() #I suppose you want boy and Boy to be the same word
        for word in line.split():
            #what if your word has funky punctuations chars next to it?
            word = word.translate(string.maketrans("",""), string.punctuation)
            #if it's already in the d increase the number
            try:
                wordcount[word] += 1
            #if it's not this is the first time we are adding it
            except:
                wordcount[word] = 1

print wordcount

祝好运！

Answer 4

如果您想查看另一种方法。 它并不是按照您的要求逐行和逐字逐句地进行的，但是您应该意识到collections模块有时会非常有用。

from collections import Counter
# instantiate a counter element
c = Counter()
with open('myfile.txt', 'r') as f:
     for line in f:
         # Do all the cleaning you need here 
         c.update(line.lower().split())

# Get all the statistic you want, for example:
c.most_common(10)

将单词从文件读入字典

问题描述

4 个解决方案

解决方案1
3 已采纳 2015-06-21 23:49:47

解决方案2
1 2015-06-21 23:50:08

解决方案3
1 2015-06-22 00:28:26

解决方案4
0 2015-06-23 19:57:01

将单词从文件读入字典

问题描述

4 个解决方案

解决方案1 3 已采纳 2015-06-21 23:49:47

解决方案2 1 2015-06-21 23:50:08

解决方案3 1 2015-06-22 00:28:26

解决方案4 0 2015-06-23 19:57:01

解决方案1
3 已采纳 2015-06-21 23:49:47

解决方案2
1 2015-06-21 23:50:08

解决方案3
1 2015-06-22 00:28:26

解决方案4
0 2015-06-23 19:57:01