简体   繁体   中英

Python Homework help: issues with counting integers, splitting, and returning most/least common words in text file

I am having a lot of issues with my homework, and unfortunately my grasp of this concept isn't as strong as others. However, I have written a majority of my code and the idea is clear, but it is clear my syntax is incorrect. According to my assignment, I have to do this:


Students will write a program that:

  1. Accepts as user input the name of a file to process. If the named file does not exist, appropriate error handing will occur and the name of the file will be again requested. This will repeat until either a valid file name is entered or the string “ALL DONE” is entered as the file name. The program will assume that the named file is an ordinary text file and not a pickled file. Use the readline() function, not readlines(), to read the file.

  2. The file will be processed ONE LINE AT A TIME:

    a. Certain characters will be removed from the input. Using the string module, the following statement will define which characters to remove. (string.punctuation + string.whitespace).replace(' ', '')

    b. Once the specific characters are removed from the input, the remainder of the input line will be split on “word” boundaries where words are separated with the space character (' ').

    c. Each “word” of the processed input will be stored in a dictionary as a key where the value is the number of times that the word occurs in the input. However, if the “word” is an integer, the word will not be stored in the dictionary and instead will be summed so that the total of all integers in the processed file can be displayed.

  3. One the file has been processed, the following information will be displayed:

    a. The total of all integers in the file

    b. The 5 most common words in the file

    c. The 5 least common words in the file.

    Note that it is very likely that there will be more than 5 words that are the “least common” in the file. In this case, you should print any 5 of those least common words. For example, if there are 7 words with a frequency of '1', then listing any 5 of them will suffice, but only list 5.


So, I wrote my code to the best of my ability. My code is:

#creates text file
def create_text():
   with open("hw5speech.txt", "wt") as out_file:
    out_file.write(
        """
    Lincoln's Gettysburg Address
    Given November 19, 1863
    Near Gettysburg, Pennsylvania, USA


    Four score and seven years ago, our fathers brought forth upon this continent a new nation:     conceived in liberty, and dedicated to the proposition that all men are created equal.

    Now we are engaged in a great civil war ... testing whether that
    nation, or any nation so conceived and so dedicated ... can long
    endure. We are met on a great battlefield of that war.

    We have come to dedicate a portion of that field as a final resting place for those who here     gave their lives that that nation might live. It is altogether fitting and proper that we should do this.

    But, in a larger sense, we cannot dedicate ... we cannot consecrate ... we cannot hallow this     ground. The brave men, living and dead, who struggled here have consecrated it, far above our poor     power to add or detract. The world will little note, nor long remember, what we say here, but it can     never forget what they did here.

    It is for us the living, rather, to be dedicated here to the unfinished work which they who   fought here have thus far so nobly advanced. It is rather for us to be here dedicated to the great task remaining before us ... that from these honored dead we take increased devotion to that cause for which they gave the last full measure of devotion ... that we here highly resolve that these dead shall not have died in vain ... that this nation, under God, shall have a new birth of freedom ... and that government of the people ... by the people ... for the people ... shall not perish from the  earth.
      """
     )
    out_file.close()

#user input to read a text file
def user_input():
    done = False
    while not done:
        file_prompt = input("What file would you like to open? (name is hw5speech.txt) \
        \n(Or, enter ALL DONE to exit) ")
        if file_prompt == "ALL DONE":
            done = True
        else:
            try:
                text_file = open(file_prompt, "rt")
                return text_file
                done = True
            #required exception handling
            except IOError:
                print("\nThat is not a valid file name. ")
                print()

#read and modify file
def read_file():
    import string
    text_file = user_input()
    for line in text_file.readline():
        myList = line.split(string.punctuation + string.whitespace)#.replace('', " ")
        #myList.split('')                                            

    #store words and integer count
    int_count = 0

    #finds if word is an integer and then adds to count of integers
    def top_integers():
        int_count = 0
        for word in myList:
           test = word.isdigit()
           if test is True:
               int_count += 1

            print("The total of all integers is: ", int_count)

    #finds the 5 most common words
    from collections import Counter
    def t5():
        t5count = Counter(myList.split())
        top5 = t5count.most_common(5)

        print("The 5 most common words are: ")
        for i in top5:
            print(i[0]) #should only print the word and not the count

    #finds the 5 least common words
    def l5():
        l5count = Counter(myList.split())
        least5 = l5count.least_common(5)

        print("The 5 least common words are: ")
        for i in least5:
            print(i[0])

    #calls the above functions
    top_integers()
    t5()
    l5()

#main function of program
def final_product():
    create_text()
    read_file()

final_product()
input("Press Enter to exit.")

So, when I run the code, I enter in the filename (hw5speech.txt). This works fine. Then, it returns The total of all integers is: 0

And then an AttributeError saying 'list' object has no attribute 'split' on line 73. Is myList having a scope issue?

There was one point in the programming process where everything actually worked without any errors. However, what would be returned would be:

The total of all integers is: 0
The 5 most common words are:
The 5 least common words are:

Press Enter to exit.

So, assuming I fix the error, I'm sure I will still get the blank error. What in the world am I doing wrong? I've looked at plenty of topics on Stack Overflow and used different methods, but I either get errors or the values won't be returned. What can I look at so that I can fix my code?

Thank you all so much!

for line in text_file.readline():
        myList = ...

You're reading a single line. The for loop is iterating over the characters in the line and each time through the loop you overwrite myList. Splitting a single character returns a list with a single character in it.

That's what myList is at the end, a list of one character, the last one in the first line of the text.

Then:

    for word in myList:
       test = word.isdigit()

this runs, but the only "word" in mylist is "s" so it counts no numbers, and says so.

Then:

t5count = Counter(myList.split())

and you can't split a list. (if the list was right, you could pass it straight into Counter).

You need to be going over every line in the file with for line in text_file: , and starting myList as an empty list and building it myList up using myList += line.split(..) or myList.extend(line.split(...)) .

Since it is homework rather than answering I will give you some hints, when you are having a problem with a program you need to either:

  1. Use a debugger to step through the program making sure that you have what you expect at each stage, (both type and value). Or
  2. Add print statements after the operations, to ensure you have what you expect, (eg after reading the text file print out myList to make sure it is a list and has all the lines you expect.

Also with either method you can check to see what type myList is before calling split on it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM