Python家庭作業幫助：計數整數，拆分和返回文本文件中最常見/最不常見的單詞的問題

Question

我的作業有很多問題，不幸的是，我對這個概念的掌握不如其他人強。 但是，我已經編寫了大部分代碼，想法很明確，但是很明顯我的語法不正確。 根據我的任務，我必須這樣做：

學生將編寫一個程序，該程序：

作為用戶輸入接受要處理的文件名。 如果命名文件不存在，將發生適當的錯誤處理，並再次請求文件名。 重復此過程，直到輸入了有效的文件名或輸入字符串“ ALL DONE”作為文件名。 該程序將假定命名文件是普通文本文件，而不是腌制文件。 使用readline（）函數而不是readlines（）函數來讀取文件。
該文件將一次被處理：
一種。 某些字符將從輸入中刪除。 使用字符串模塊，以下語句將定義要刪除的字符。 （字符串。標點+字符串。空白）.replace（''，''）
b。 從輸入中刪除特定字符后，輸入行的其余部分將在“單詞”邊界上分割，其中單詞之間用空格字符（''）分隔。
C。 處理后的輸入的每個“單詞”將作為關鍵字存儲在字典中，其中值是該單詞在輸入中出現的次數。 但是，如果“單詞”是整數，則該單詞將不會存儲在字典中，而是會被求和，以便可以顯示已處理文件中所有整數的總和。
其中一個文件已處理，將顯示以下信息：
一種。 文件中所有整數的總和
b。 文件中最常見的5個單詞
C。 文件中的5個最不常用的詞。
請注意，文件中“最不常見”的單詞很可能超過5個。 在這種情況下，您應該打印出其中5個最不常用的單詞。 例如，如果有7個單詞的頻率為“ 1”，則列出其中任何5個就足夠了，但僅列出5個即可。

因此，我盡力編寫了代碼。 我的代碼是：

#creates text file
def create_text():
   with open("hw5speech.txt", "wt") as out_file:
    out_file.write(
        """
    Lincoln's Gettysburg Address
    Given November 19, 1863
    Near Gettysburg, Pennsylvania, USA


    Four score and seven years ago, our fathers brought forth upon this continent a new nation:     conceived in liberty, and dedicated to the proposition that all men are created equal.

    Now we are engaged in a great civil war ... testing whether that
    nation, or any nation so conceived and so dedicated ... can long
    endure. We are met on a great battlefield of that war.

    We have come to dedicate a portion of that field as a final resting place for those who here     gave their lives that that nation might live. It is altogether fitting and proper that we should do this.

    But, in a larger sense, we cannot dedicate ... we cannot consecrate ... we cannot hallow this     ground. The brave men, living and dead, who struggled here have consecrated it, far above our poor     power to add or detract. The world will little note, nor long remember, what we say here, but it can     never forget what they did here.

    It is for us the living, rather, to be dedicated here to the unfinished work which they who   fought here have thus far so nobly advanced. It is rather for us to be here dedicated to the great task remaining before us ... that from these honored dead we take increased devotion to that cause for which they gave the last full measure of devotion ... that we here highly resolve that these dead shall not have died in vain ... that this nation, under God, shall have a new birth of freedom ... and that government of the people ... by the people ... for the people ... shall not perish from the  earth.
      """
     )
    out_file.close()

#user input to read a text file
def user_input():
    done = False
    while not done:
        file_prompt = input("What file would you like to open? (name is hw5speech.txt) \
        \n(Or, enter ALL DONE to exit) ")
        if file_prompt == "ALL DONE":
            done = True
        else:
            try:
                text_file = open(file_prompt, "rt")
                return text_file
                done = True
            #required exception handling
            except IOError:
                print("\nThat is not a valid file name. ")
                print()

#read and modify file
def read_file():
    import string
    text_file = user_input()
    for line in text_file.readline():
        myList = line.split(string.punctuation + string.whitespace)#.replace('', " ")
        #myList.split('')                                            

    #store words and integer count
    int_count = 0

    #finds if word is an integer and then adds to count of integers
    def top_integers():
        int_count = 0
        for word in myList:
           test = word.isdigit()
           if test is True:
               int_count += 1

            print("The total of all integers is: ", int_count)

    #finds the 5 most common words
    from collections import Counter
    def t5():
        t5count = Counter(myList.split())
        top5 = t5count.most_common(5)

        print("The 5 most common words are: ")
        for i in top5:
            print(i[0]) #should only print the word and not the count

    #finds the 5 least common words
    def l5():
        l5count = Counter(myList.split())
        least5 = l5count.least_common(5)

        print("The 5 least common words are: ")
        for i in least5:
            print(i[0])

    #calls the above functions
    top_integers()
    t5()
    l5()

#main function of program
def final_product():
    create_text()
    read_file()

final_product()
input("Press Enter to exit.")

因此，當我運行代碼時，我輸入文件名（hw5speech.txt）。 這很好。 然后，它返回所有整數的總和為：0

然后，在第73行上，一個AttributeError說“ list”對象沒有屬性“ split”。myList是否存在作用域問題？

在編程過程中，有一點實際上一切正常，沒有任何錯誤。 但是，將返回以下內容：

The total of all integers is: 0
The 5 most common words are:
The 5 least common words are:

Press Enter to exit.

因此，假設我已修復錯誤，我確定我仍然會收到空白錯誤。 我到底在做什么錯？ 我看了很多關於Stack Overflow的主題，並使用了不同的方法，但是我遇到了錯誤，或者值不會返回。 我應該看什么才能修復代碼？

非常感謝大家！

Answer 1

for line in text_file.readline():
        myList = ...

您正在閱讀一行。 for循環遍歷該行中的字符，並且每次循環都覆蓋myList。 拆分單個字符將返回其中包含單個字符的列表。

這就是myList的末尾，一個字符的列表，文本第一行的最后一個字符。

然后：

    for word in myList:
       test = word.isdigit()

這樣就可以運行，但是mylist中唯一的“單詞”是“ s”，因此它不計算數字，所以是這樣。

然后：

t5count = Counter(myList.split())

而且您無法拆分列表。 （如果列表正確，則可以直接將其傳遞給Counter）。

您需要遍歷文件中的每一行，並for line in text_file:使用for line in text_file: ，並將myList作為空列表啟動，並使用myList += line.split(..)或myList.extend(line.split(...)) 。

Answer 2

由於這是家庭作業，而不是回答問題，因此我會給您一些提示，當您遇到程序問題時，您需要：

使用調試器逐步執行該程序，以確保您在每個階段都擁有所需的內容（類型和值）。 要么
在操作之后添加打印語句，以確保您具有期望的效果（例如，在讀取文本文件后，打印出myList以確保它是一個列表並且具有您期望的所有行）。

同樣，無論使用哪種方法，您都可以在調用split 之前檢查myList的類型。

Python家庭作業幫助：計數整數，拆分和返回文本文件中最常見/最不常見的單詞的問題

問題描述

2 個解決方案

解決方案1
0 2014-11-29 07:10:33

解決方案2
0 2014-11-29 07:13:05

Python家庭作業幫助：計數整數，拆分和返回文本文件中最常見/最不常見的單詞的問題

問題描述

2 個解決方案

解決方案1 0 2014-11-29 07:10:33

解決方案2 0 2014-11-29 07:13:05

解決方案1
0 2014-11-29 07:10:33

解決方案2
0 2014-11-29 07:13:05