在Python中計算文本文件中單詞的頻率

Question

我試圖弄清楚如何制作一個程序來獲取用戶選擇的文件（通過輸入文件名）並計算用戶輸入的每個單詞的頻率。

我擁有大部分信息，但是當我輸入多個單詞供程序查找時，只有第一個單詞顯示正確的頻率，其余的顯示為“ 0次”

file_name = input("What file would you like to open? ")
f = open(file_name, "r")
the_full_text = f.read()
words = the_full_text.split()
search_word = input("What words do you want to find? ").split(",")
len_list = len(search_word) 

word_number = 0
print()
print ('... analyzing ... hold on ...')
print()
print ('Frequency of word usage within', file_name+":")
for i in range(len_list):

    frequency = 0
    for word in words:
        word = word.strip(",.")
        if search_word[word_number].lower() == word.lower():
            frequency += 1
    print ("   ",format(search_word[word_number].strip(),'<20s'),"/", frequency, "occurrences")
    word_number = word_number + 1

像一個示例輸出將是：

What file would you like to open? assignment_8.txt
What words do you want to find? wey, rights, dem

... analyzing ... hold on ...

Frequency of word usage within assignment_8.txt:
    wey                  / 96 occurrences
    rights               / 0 occurrences
    dem                  / 0 occurrences

我的程序怎么了？ 請幫忙：o

Answer 1

您需要從搜索詞中刪除空格。

但是，您當前的算法效率很低，因為它必須為每個搜索詞重新掃描整個文本。 這是一種更有效的方法。 首先，我們清理搜索詞並將其放入列表中。 然后，我們從該列表中構建字典，以在文本文件中找到這些單詞時存儲每個單詞的計數。

file_name = input("What file would you like to open? ")
with open(file_name, "r") as f:
    words = f.read().split()

search_words = input("What words do you want to find? ").split(',')
search_words = [word.strip().lower() for word in search_words]
#print(search_words)
search_counts = dict.fromkeys(search_words, 0)

print ('\n... analyzing ... hold on ...')
for word in words:
    word = word.rstrip(",.").lower()
    if word in search_counts:
        search_counts[word] += 1

print ('\nFrequency of word usage within', file_name + ":")
for word in search_words:
    print("   {:<20s} / {} occurrences".format(word, search_counts[word]))

在Python中計算文本文件中單詞的頻率

問題描述

1 個解決方案

解決方案1
1 2016-12-07 08:02:48

在Python中計算文本文件中單詞的頻率

問題描述

1 個解決方案

解決方案1 1 2016-12-07 08:02:48

解決方案1
1 2016-12-07 08:02:48