Python查找最常用的代碼

Question

我想閱讀一個文件，找到最常用的單詞。 以下是代碼。 我假設讀文件我犯了一些錯誤。 任何建議將不勝感激。

txt_file = open('result.txt', 'r')

for line in txt_file:
    for word in line.strip().split():
        word = word.strip(punctuation).lower()

    all_words = nltk.FreqDist(word for word in word.words())
    top_words = set(all_words.keys()[:300])
    print top_words

輸入result.txt文件

Musik to shiyuki miyama opa samba japan obi Musik Musik Musik 
Antiques    antique 1900 s sewing pattern pictorial review size Musik 36 bust 1910 s ladies waist bust

Answer 1

我不確定你的錯誤是什么，也不知道如何使用NLTK，但是你通過循環的方法，然后單詞可以適應使用一個簡單的python字典來跟蹤計數：

txt_file = open("filename", "r")
txt_file.readLines()

wordFreq = {}
for line in txt_file:
    for word in line.strip().split():
        word = word.strip(punctuation).lower()
        # If word is already in dict, increase count
        if word in wordFreq:
            wordFreq[word] += 1
        else:    #Otherwise, add word to dict and initialize count to 1
            wordFreq[word] = 1

要查詢結果，只需將dict中感興趣的單詞作為鍵，即wordFreq['Musik'] 。

Answer 2

from collections import Counter
txt_file = open('result.txt', 'r')
words = [word for line in txt_file for word in line.strip().split()]
print Counter(words).most_common(1)

您可以給出任何數字，而不是most_common的1 ，而且會顯示大量最常用的數據。 例如

print Counter(words).most_common(1)

結果是

[('Musik', 5)]

在哪里

print Counter(words).most_common(5)

給

[('Musik', 5), ('bust', 2), ('s', 2), ('antique', 1), ('ladies', 1)]

該數字實際上是一個可選參數，如果省略，它將按降序給出所有單詞的頻率。

Python查找最常用的代碼

問題描述

2 個解決方案

解決方案1
1 2013-09-11 04:22:26

解決方案2
1 2013-09-11 04:35:10

Python查找最常用的代碼

問題描述

2 個解決方案

解決方案1 1 2013-09-11 04:22:26

解決方案2 1 2013-09-11 04:35:10

解決方案1
1 2013-09-11 04:22:26

解決方案2
1 2013-09-11 04:35:10