Python家庭作业帮助：计数整数，拆分和返回文本文件中最常见/最不常见的单词的问题

Question

我的作业有很多问题，不幸的是，我对这个概念的掌握不如其他人强。 但是，我已经编写了大部分代码，想法很明确，但是很明显我的语法不正确。 根据我的任务，我必须这样做：

学生将编写一个程序，该程序：

作为用户输入接受要处理的文件名。 如果命名文件不存在，将发生适当的错误处理，并再次请求文件名。 重复此过程，直到输入了有效的文件名或输入字符串“ ALL DONE”作为文件名。 该程序将假定命名文件是普通文本文件，而不是腌制文件。 使用readline（）函数而不是readlines（）函数来读取文件。
该文件将一次被处理：
一种。 某些字符将从输入中删除。 使用字符串模块，以下语句将定义要删除的字符。 （字符串。标点+字符串。空白）.replace（''，''）
b。 从输入中删除特定字符后，输入行的其余部分将在“单词”边界上分割，其中单词之间用空格字符（''）分隔。
C。 处理后的输入的每个“单词”将作为关键字存储在字典中，其中值是该单词在输入中出现的次数。 但是，如果“单词”是整数，则该单词将不会存储在字典中，而是会被求和，以便可以显示已处理文件中所有整数的总和。
其中一个文件已处理，将显示以下信息：
一种。 文件中所有整数的总和
b。 文件中最常见的5个单词
C。 文件中的5个最不常用的词。
请注意，文件中“最不常见”的单词很可能超过5个。 在这种情况下，您应该打印出其中5个最不常用的单词。 例如，如果有7个单词的频率为“ 1”，则列出其中任何5个就足够了，但仅列出5个即可。

因此，我尽力编写了代码。 我的代码是：

#creates text file
def create_text():
   with open("hw5speech.txt", "wt") as out_file:
    out_file.write(
        """
    Lincoln's Gettysburg Address
    Given November 19, 1863
    Near Gettysburg, Pennsylvania, USA


    Four score and seven years ago, our fathers brought forth upon this continent a new nation:     conceived in liberty, and dedicated to the proposition that all men are created equal.

    Now we are engaged in a great civil war ... testing whether that
    nation, or any nation so conceived and so dedicated ... can long
    endure. We are met on a great battlefield of that war.

    We have come to dedicate a portion of that field as a final resting place for those who here     gave their lives that that nation might live. It is altogether fitting and proper that we should do this.

    But, in a larger sense, we cannot dedicate ... we cannot consecrate ... we cannot hallow this     ground. The brave men, living and dead, who struggled here have consecrated it, far above our poor     power to add or detract. The world will little note, nor long remember, what we say here, but it can     never forget what they did here.

    It is for us the living, rather, to be dedicated here to the unfinished work which they who   fought here have thus far so nobly advanced. It is rather for us to be here dedicated to the great task remaining before us ... that from these honored dead we take increased devotion to that cause for which they gave the last full measure of devotion ... that we here highly resolve that these dead shall not have died in vain ... that this nation, under God, shall have a new birth of freedom ... and that government of the people ... by the people ... for the people ... shall not perish from the  earth.
      """
     )
    out_file.close()

#user input to read a text file
def user_input():
    done = False
    while not done:
        file_prompt = input("What file would you like to open? (name is hw5speech.txt) \
        \n(Or, enter ALL DONE to exit) ")
        if file_prompt == "ALL DONE":
            done = True
        else:
            try:
                text_file = open(file_prompt, "rt")
                return text_file
                done = True
            #required exception handling
            except IOError:
                print("\nThat is not a valid file name. ")
                print()

#read and modify file
def read_file():
    import string
    text_file = user_input()
    for line in text_file.readline():
        myList = line.split(string.punctuation + string.whitespace)#.replace('', " ")
        #myList.split('')                                            

    #store words and integer count
    int_count = 0

    #finds if word is an integer and then adds to count of integers
    def top_integers():
        int_count = 0
        for word in myList:
           test = word.isdigit()
           if test is True:
               int_count += 1

            print("The total of all integers is: ", int_count)

    #finds the 5 most common words
    from collections import Counter
    def t5():
        t5count = Counter(myList.split())
        top5 = t5count.most_common(5)

        print("The 5 most common words are: ")
        for i in top5:
            print(i[0]) #should only print the word and not the count

    #finds the 5 least common words
    def l5():
        l5count = Counter(myList.split())
        least5 = l5count.least_common(5)

        print("The 5 least common words are: ")
        for i in least5:
            print(i[0])

    #calls the above functions
    top_integers()
    t5()
    l5()

#main function of program
def final_product():
    create_text()
    read_file()

final_product()
input("Press Enter to exit.")

因此，当我运行代码时，我输入文件名（hw5speech.txt）。 这很好。 然后，它返回所有整数的总和为：0

然后，在第73行上，一个AttributeError说“ list”对象没有属性“ split”。myList是否存在作用域问题？

在编程过程中，有一点实际上一切正常，没有任何错误。 但是，将返回以下内容：

The total of all integers is: 0
The 5 most common words are:
The 5 least common words are:

Press Enter to exit.

因此，假设我已修复错误，我确定我仍然会收到空白错误。 我到底在做什么错？ 我看了很多关于Stack Overflow的主题，并使用了不同的方法，但是我遇到了错误，或者值不会返回。 我应该看什么才能修复代码？

非常感谢大家！

Answer 1

for line in text_file.readline():
        myList = ...

您正在阅读一行。 for循环遍历该行中的字符，并且每次循环都覆盖myList。 拆分单个字符将返回其中包含单个字符的列表。

这就是myList的末尾，一个字符的列表，文本第一行的最后一个字符。

然后：

    for word in myList:
       test = word.isdigit()

这样就可以运行，但是mylist中唯一的“单词”是“ s”，因此它不计算数字，所以是这样。

然后：

t5count = Counter(myList.split())

而且您无法拆分列表。 （如果列表正确，则可以直接将其传递给Counter）。

您需要遍历文件中的每一行，并for line in text_file:使用for line in text_file: ，并将myList作为空列表启动，并使用myList += line.split(..)或myList.extend(line.split(...)) 。

Answer 2

由于这是家庭作业，而不是回答问题，因此我会给您一些提示，当您遇到程序问题时，您需要：

使用调试器逐步执行该程序，以确保您在每个阶段都拥有所需的内容（类型和值）。 要么
在操作之后添加打印语句，以确保您具有期望的效果（例如，在读取文本文件后，打印出myList以确保它是一个列表并且具有您期望的所有行）。

同样，无论使用哪种方法，您都可以在调用split 之前检查myList的类型。

Python家庭作业帮助：计数整数，拆分和返回文本文件中最常见/最不常见的单词的问题

问题描述

2 个解决方案

解决方案1
0 2014-11-29 07:10:33

解决方案2
0 2014-11-29 07:13:05

Python家庭作业帮助：计数整数，拆分和返回文本文件中最常见/最不常见的单词的问题

问题描述

2 个解决方案

解决方案1 0 2014-11-29 07:10:33

解决方案2 0 2014-11-29 07:13:05

解决方案1
0 2014-11-29 07:10:33

解决方案2
0 2014-11-29 07:13:05