如何返回包含短語的元組列表，以及它按降序出現的次數？

Question

以下是我的作業提示：

“寫一個名為phrase_freq的function，它以兩個文件名字符串作為輸入。第一個文件名是指包含一本書的文件。第二個文件名包含短語，一個短語到一行。function使用字典到Z41D78DC8ED5121中的每個短語Z41D78DC8ED5121第二個文件到 integer 表示短語出現在第一個文件中的次數。無論大小寫如何，都應計算短語（例如，如果短語是“The Goops”，那么短語“The goops”，“the Goops ”和“goops”，應該算在內）。function 返回一個包含短語和短語出現次數的元組列表，從出現次數最多到最少。

我認為我的思維過程是正確的，但我並不完全在那里。

以下是我擁有的當前代碼：


def phrase_freq(file1, file2):   
#1. Initialize a dictionary. 
    a_dict = {}
#2. Open the file containing the phrases from the book.
    in_file = open(file1)
#3. Iterate though the data in the phrases file. 
    for data in in_file:
#4. Add this data into the dictionary.
        a_dict = data
#5. Close this file. 
    in_file.close()
#6. Open the file containing text from book.
    in_file_2 = open(file2)
#7. Assign values to key and val variables in dict. 
    key = data
    val = 0
#8. Iterate through second file of book text. 
    for other_data in in_file_2: 
#9. Determine if phrases from file1 are in the book text. 
        if key in file2: 
#10. Add 1 to the instance which the phrase is found in text. 
            a_dict[key] = a_dict[key] + 1
#11. If not found more than once, keep freq. value at one. 
        else: 
             a_dict[key] = 1  
#Above giving me error. "TypeError: 'str' object does not support item assignment."
#12. Close the book text file. 
    in_file_2.close()   
#13. Return the list of phrases and their frequency in text. 
    return list(a_dict.items())

output 應如下所示：

>>> phrase_freq("goops.txt","small_phrase.txt")
[('The Goops', 11)]
>>> phrase_freq("goops.txt","large_phrase.txt")
[('The Goops', 11), ('a Goop', 9), ('your mother', 5), ('you know', 3), ('your father', 3), ('never knew', 2), ('forget it', 1)]

Answer 1

您應該從短語文件而不是書籍文件創建字典。 這本書可能不會在一行中單獨包含每個短語。

從短語文件中讀取時，您需要刪除該行，以刪除換行符。

您必須將所有內容轉換為普通大小寫，以便您可以不區分大小寫地將短語與書籍進行比較。

您必須測試一個短語是否在行中，而不是相反。

最后，要求說返回的列表應按找到該短語的次數排序。

import operator

def phrase_freq(file1, file2):   
    a_dict = {}
    # initialize dictionary with all the phrases
    with open(file2) as phrases:
        for phrase in phrases:
            a_dict[phrase.strip().lower()] = 0
    with open(file1) as book:
        # check for each phrase on each line of the book
        for line in book:
            line = line.lower()
            for phrase in a_dict:
                if phrase in line:
                    a_dict[phrase] += 1
    # Return sorted list
    return sorted(a_dict.items(), key=operator.itemgetter(1), reverse=True)

如何返回包含短語的元組列表，以及它按降序出現的次數？

問題描述

1 個解決方案

解決方案1
0 2020-05-22 12:45:25

如何返回包含短語的元組列表，以及它按降序出現的次數？

問題描述

1 個解決方案

解決方案1 0 2020-05-22 12:45:25

解決方案1
0 2020-05-22 12:45:25