简体   繁体   中英

How do I return a list of tuples containing a phrase, and the number of times it appears in descending order?

The following is the prompt for my homework:

"Write a function called phrase_freq that takes as input two file name strings. The first file name refers to a file containing a book. The second file name contains phrases, one phrase to a line. The function uses a dictionary to map each phrase in the second file to an integer representing the number of times the phrase appears in the first file. Phrases should be counted regardless of their capitalization (eg, if the phrase is "The Goops", then the phrases "The goops", "the Goops", and "the goops", should be counted). The function returns a list of tuples containing the phrase and the number of times the phrase appears, sorted from largest number of appearances to the smallest."

I think my thought process is along the right track, but I am not quite there.

The following is the current code that I have:


def phrase_freq(file1, file2):   
#1. Initialize a dictionary. 
    a_dict = {}
#2. Open the file containing the phrases from the book.
    in_file = open(file1)
#3. Iterate though the data in the phrases file. 
    for data in in_file:
#4. Add this data into the dictionary.
        a_dict = data
#5. Close this file. 
    in_file.close()
#6. Open the file containing text from book.
    in_file_2 = open(file2)
#7. Assign values to key and val variables in dict. 
    key = data
    val = 0
#8. Iterate through second file of book text. 
    for other_data in in_file_2: 
#9. Determine if phrases from file1 are in the book text. 
        if key in file2: 
#10. Add 1 to the instance which the phrase is found in text. 
            a_dict[key] = a_dict[key] + 1
#11. If not found more than once, keep freq. value at one. 
        else: 
             a_dict[key] = 1  
#Above giving me error. "TypeError: 'str' object does not support item assignment."
#12. Close the book text file. 
    in_file_2.close()   
#13. Return the list of phrases and their frequency in text. 
    return list(a_dict.items())

The output should appear like the following test cases:

>>> phrase_freq("goops.txt","small_phrase.txt")
[('The Goops', 11)]
>>> phrase_freq("goops.txt","large_phrase.txt")
[('The Goops', 11), ('a Goop', 9), ('your mother', 5), ('you know', 3), ('your father', 3), ('never knew', 2), ('forget it', 1)]

You should create the dictionary from the phrase file, not the book file. The book may not contain each phrase alone on a line.

You need to strip the line when reading from the phrase file, to remove the newlines.

You have to convert everything to a common case so that you can compare phrases with the book case-insensitively.

You have to test whether a phrase is in the line, not the other way around.

Finally, the requirements say that the list that's returned should be sorted by the number of times the phrase was found.

import operator

def phrase_freq(file1, file2):   
    a_dict = {}
    # initialize dictionary with all the phrases
    with open(file2) as phrases:
        for phrase in phrases:
            a_dict[phrase.strip().lower()] = 0
    with open(file1) as book:
        # check for each phrase on each line of the book
        for line in book:
            line = line.lower()
            for phrase in a_dict:
                if phrase in line:
                    a_dict[phrase] += 1
    # Return sorted list
    return sorted(a_dict.items(), key=operator.itemgetter(1), reverse=True)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM