從.txt中讀取單詞，並對每個單詞計數

Question

我想知道如何讀取像fscanf這樣的字符串。 我需要閱讀所有.txt文件中的文字。 我需要對每個單詞計數。

collectwords = collections.defaultdict(int)

with open('DatoSO.txt', 'r') as filetxt:

for line in filetxt:
    v=""
    for char in line:
        if str(char) != " ":
          v=v+str(char)

        elif str(char) == " ":
          collectwords[v] += 1
          v=""

這樣，我看不懂最后的單詞。

Answer 1

嗯，像這樣嗎？

with open('DatoSO.txt', 'r') as filetxt:
    for line in filetxt:
        for word in line.split():
            collectwords[word] += 1

Answer 2

如果您使用的是Python> = 2.7，也可以考慮使用collections.counter

http://docs.python.org/library/collections.html#collections.Counter

它添加了諸如“ most_common”之類的許多方法，這些方法在此類應用程序中可能很有用。

來自Doug Hellmann的PyMOTW：

import collections

c = collections.Counter()
with open('/usr/share/dict/words', 'rt') as f:
    for line in f:
        c.update(line.rstrip().lower())

print 'Most common:'
for letter, count in c.most_common(3):
    print '%s: %7d' % (letter, count)

http://www.doughellmann.com/PyMOTW/collections/counter.html －盡管這不是字母計數而是字數統計。 在c.update行中，您需要將line.rstrip().lower替換為line.split() ，也許還需要一些代碼來消除標點符號。

編輯：在這里刪除標點符號可能是最快的解決方案：

import collections
import string

c = collections.Counter()
with open('DataSO.txt', 'rt') as f:
    for line in f:
        c.update(line.translate(string.maketrans("",""), string.punctuation).split())

（從以下問題中借出了從Python中的字符串中刪除標點符號的最佳方法）

Answer 3

Python使這變得容易：

collectwords = []
filetxt = open('DatoSO.txt', 'r')

for line in filetxt:
  collectwords.extend(line.split())

從.txt中讀取單詞，並對每個單詞計數

問題描述

3 個解決方案

解決方案1
3 2011-03-27 21:28:59

解決方案2
3 已采納 2011-03-27 21:57:44

解決方案3
1 2011-03-27 21:27:13

從.txt中讀取單詞，並對每個單詞計數

問題描述

3 個解決方案

解決方案1 3 2011-03-27 21:28:59

解決方案2 3 已采納 2011-03-27 21:57:44

解決方案3 1 2011-03-27 21:27:13

解決方案1
3 2011-03-27 21:28:59

解決方案2
3 已采納 2011-03-27 21:57:44

解決方案3
1 2011-03-27 21:27:13