簡體   English   中英

Python函數獲取文本文件並創建以鍵為單詞和值作為頻率的字典

[英]Python function to take text file and create dictionary with keys as words and values as frequencies

首先,如果我的問題中有任何難以理解的內容,我會提前道歉,因為我是 Python 的初學者,而且很晚了,所以很累。

我試圖弄清楚為什么我在創建此函數以獲取文本文件並創建一個包含單詞和頻率的字典時不斷收到錯誤,並打印文件中哪個單詞的頻率最高。

這是我的代碼:

def poet(filename):
    word_frequency = {}
    with open(filename,'r') as f:
        for line in f:
            for word in line.split():
                word = word.replace('.',"")
                word = word.replace(',',"")
                word = word.replace(';',"")
                if word in word_frequency:
                    word_frequency[word] += 1;
                else:
                    word_frequency[word] = 1;
most_freq_word = max(word_frequency, key=word_frequency)
print("The word " + most_freq_word + " is in text ")
str(word_frequency[most_freq_word]) + " times"
print(word_frequency)


poet('Poem.txt')

這是我收到的錯誤:

Traceback (most recent call last):
  File "C:/Users/Noah/Desktop/Python/3.py", line 20, in <module>
    str(word_frequency[most_freq_word]) + " times"
NameError: name 'word_frequency' is not defined

另外,如果有什么不清楚的,請評論,我會立即回復,提前謝謝。

編輯:

感謝您的回復,我已將其實現到我的代碼中,但我現在收到此錯誤:

Traceback (most recent call last):
  File "C:/Users/Noah/Desktop/Python/3.py", line 20, in <module>
    poet('FrostPoem.txt')
  File "C:/Users/Noah/Desktop/Python/3.py", line 14, in poet
    most_freq_word = max(word_frequency, key=word_frequency)
TypeError: 'dict' object is not callable

新代碼是:

def poet(filename):
    word_frequency = {}
    with open(filename,'r') as f:
        for line in f:
            for word in line.split():
                word = word.replace('.',"")
                word = word.replace(',',"")
                word = word.replace(';',"")
                if word in word_frequency:
                    word_frequency[word] += 1;
                else:
                    word_frequency[word] = 1;

    most_freq_word = max(word_frequency, key=word_frequency)
    print("The word " + most_freq_word + " is in text " + \
    str(word_frequency[most_freq_word]) + " times")
    print(word_frequency)


poet('Poem.txt')

啊哈,這是你的問題:你的幾行應該函數內,像這樣:

def poet(filename):
    word_frequency = {}
    with open(filename,'r') as f:
        for line in f:
            for word in line.split():
                word = word.replace('.',"")
                word = word.replace(',',"")
                word = word.replace(';',"")
                if word in word_frequency:
                    word_frequency[word] += 1;
                else:
                    word_frequency[word] = 1;

    most_freq_word = max(word_frequency, key=word_frequency)
    print("The word " + most_freq_word + " is in text " + \
    str(word_frequency[most_freq_word]) + " times")
    print(word_frequency)


poet('Poem.txt')

現在,您可能希望此函數更具可重用性,例如您不想立即打印但想用word_frequency做進一步的word_frequency 在這種情況下,您需要一個return語句,您的代碼可能如下所示:

def poet(filename):
    word_frequency = {}
    with open(filename,'r') as f:
        for line in f:
            for word in line.split():
                word = word.replace('.',"")
                word = word.replace(',',"")
                word = word.replace(';',"")
                if word in word_frequency:
                    word_frequency[word] += 1;
                else:
                    word_frequency[word] = 1;

    return word_frequency

word_freq = poet('Poem.txt')
most_freq_word = max(word_freq, key=word_freq)
print("The word " + most_freq_word + " is in text " + \
str(word_freq[most_freq_word]) + " times")
print(word_freq)

響應您的編輯,替換此行

    most_freq_word = max(word_frequency, key=word_frequency)

用這條線

    most_freq_word = max(word_frequency, key=lambda x:word_frequency[x])

這根據鍵的值獲得最大值。

當你這樣做str(word_frequency[most_freq_word]) + " times"的蟒蛇,它假設word_frequency宣布before.In你的情況下word_frequency在聲明poet功能。

檢查是否存在縮進問題。

你需要字典鍵。
要解決它,請使用key=word_frequency.get

您在函數poet()定義了word_frequency ,因此范圍是本地的,但是您在外部使用了字典,這會產生錯誤。

def poet(filename):
    word_frequency = {}
    with open(filename,'r') as f:
    for line in f:
        for word in line.split():
            word = word.replace('.',"")
            word = word.replace(',',"")
            word = word.replace(';',"")
            if word in word_frequency:
                word_frequency[word] += 1;
            else:
                word_frequency[word] = 1;
    most_freq_word = max(word_frequency, key=word_frequency)
    print("The word " + most_freq_word + " is in text ")
    str(word_frequency[most_freq_word]) + " times"
    print(word_frequency)

poet('Poem.txt')

將所有指令放在函數中,它應該可以工作。

您可以按如下方式使用Counter

from collections import Counter

def poet(filename):
    with open(filename, "r") as f:
        counter = Counter(f.read().split())
    return counter

如果你想去掉 ',' 或 ';' 例如,只需將其剝離或映射到列表上即可將其刪除。

word_frequency只定義在poet函數的范圍內。 要在函數之外訪問它,您應該返回它

word_frequency = poet('Poem.txt')
most_freq_word = max(word_frequency, key=word_frequency)
print("The word " + most_freq_word + " is in text ")
str(word_frequency[most_freq_word]) + " times"
print(word_frequency)

對於您的問題,還有更好的解決方案。 您可以檢查collections.Counter 該示例完全符合您的要求

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM