簡體   English   中英

如何返回包含短語的元組列表,以及它按降序出現的次數?

[英]How do I return a list of tuples containing a phrase, and the number of times it appears in descending order?

以下是我的作業提示:

“寫一個名為phrase_freq的function,它以兩個文件名字符串作為輸入。第一個文件名是指包含一本書的文件。第二個文件名包含短語,一個短語到一行。function使用字典到Z41D78DC8ED5121中的每個短語Z41D78DC8ED5121第二個文件到 integer 表示短語出現在第一個文件中的次數。無論大小寫如何,都應計算短語(例如,如果短語是“The Goops”,那么短語“The goops”,“the Goops ”和“goops”,應該算在內)。function 返回一個包含短語和短語出現次數的元組列表,從出現次數最多到最少。

我認為我的思維過程是正確的,但我並不完全在那里。

以下是我擁有的當前代碼:


def phrase_freq(file1, file2):   
#1. Initialize a dictionary. 
    a_dict = {}
#2. Open the file containing the phrases from the book.
    in_file = open(file1)
#3. Iterate though the data in the phrases file. 
    for data in in_file:
#4. Add this data into the dictionary.
        a_dict = data
#5. Close this file. 
    in_file.close()
#6. Open the file containing text from book.
    in_file_2 = open(file2)
#7. Assign values to key and val variables in dict. 
    key = data
    val = 0
#8. Iterate through second file of book text. 
    for other_data in in_file_2: 
#9. Determine if phrases from file1 are in the book text. 
        if key in file2: 
#10. Add 1 to the instance which the phrase is found in text. 
            a_dict[key] = a_dict[key] + 1
#11. If not found more than once, keep freq. value at one. 
        else: 
             a_dict[key] = 1  
#Above giving me error. "TypeError: 'str' object does not support item assignment."
#12. Close the book text file. 
    in_file_2.close()   
#13. Return the list of phrases and their frequency in text. 
    return list(a_dict.items())

output 應如下所示:

>>> phrase_freq("goops.txt","small_phrase.txt")
[('The Goops', 11)]
>>> phrase_freq("goops.txt","large_phrase.txt")
[('The Goops', 11), ('a Goop', 9), ('your mother', 5), ('you know', 3), ('your father', 3), ('never knew', 2), ('forget it', 1)]

您應該從短語文件而不是書籍文件創建字典。 這本書可能不會在一行中單獨包含每個短語。

從短語文件中讀取時,您需要刪除該行,以刪除換行符。

您必須將所有內容轉換為普通大小寫,以便您可以不區分大小寫地將短語與書籍進行比較。

您必須測試一個短語是否在行中,而不是相反。

最后,要求說返回的列表應按找到該短語的次數排序。

import operator

def phrase_freq(file1, file2):   
    a_dict = {}
    # initialize dictionary with all the phrases
    with open(file2) as phrases:
        for phrase in phrases:
            a_dict[phrase.strip().lower()] = 0
    with open(file1) as book:
        # check for each phrase on each line of the book
        for line in book:
            line = line.lower()
            for phrase in a_dict:
                if phrase in line:
                    a_dict[phrase] += 1
    # Return sorted list
    return sorted(a_dict.items(), key=operator.itemgetter(1), reverse=True)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM