简体   繁体   English

从文本文件中提取数据

[英]Data extraction from text file

I have the following input:我有以下输入:

ID,       Last,      First,   Lecture, Tutorial, A1,  A2, A3,   A4,  A5
10034567, Smith,     Winston, L01,     T03,      6,   5.5, 8,   10,  8.5
10045678, Lee,       Bruce,   L02,     T05,      4.5, 6.5, 7,   7,   8.5
00305678, Obama,     Jack,    L01,     T05,      10,  10,  9,   9.5, 10
00567890, Brown,     Palin,   L02,     T03,      4,   7.5, 6.5, 0,   5
10012134, Harper,    Ed,      L01,     T03,      10,  9,   7.5, 10,  6.5
10014549, Johnson,   Andrew,  L01,     T05,      10,  0,   10,  5.5, 7
10020987, Clockwork, Milan,   L02,     T03,      10,  8.5, 8,   9,   9
10021234, Freeman,   Skyski   L01,     T02,      0,   10,  10,  10,  8.5
EOF

The first line of the file explains each column of the data.文件的第一行解释了数据的每一列。 Let n be the total number of students, then the next n lines of the file each corresponds to a student in the class, and contains 10 fields:设 n 为学生总数,则文件的下 n 行每行对应班级中的一个学生,并包含 10 个字段:

  1. Student ID学生卡

  2. Last name

  3. First name

  4. Lecture section讲座部分

  5. Tutorial sectiom教程部分

  6. Grades for Assignments 1 (and so on...)作业 1 的成绩(等等...)

Assuming the grades are stored in a file grades.txt then you can read an entire line of the file into a Python string s by using the following Python statements:假设成绩存储在文件grades.txt那么您可以使用以下 Python 语句将文件的整行读取到 Python 字符串 s 中:

file = open (‘grades.txt’, ‘r’)
s = file.readline()

You just need to open the file once, then you can use the readline() function multiple times, to read a successive line each time.您只需要打开文件一次,然后您可以多次使用readline()函数,每次读取连续行。 After the n lines of student records, the file ends with a last line that says EOF short for End of File .在 n 行学生记录之后,文件以最后一行结尾,该行表示EOFEnd of File缩写。

The number n is not known a priority.数字 n 不是已知的优先级。 The sample input doesnt matter it can contain from 100 to 300 students, in a file named grades.txt We wish to eventually draw a histogram for grade distribution of Assignment 1. Therefore you need to extract the grade of A1 for each student, by processing his/her corresponding line in the file.样本输入无关紧要,它可以包含 100 到 300 个学生,在一个名为grades.txt的文件中 我们希望最终绘制作业 1 的成绩分布的直方图。因此您需要提取每个学生的 A1 成绩,通过处理他/她在文件中的相应行。 Construct a list that will have one entry for each student, storing his/her A1 grade.构建一个列表,每个学生都有一个条目,存储他/她的 A1 成绩。 Each time you extract a new A1 grade, append it to this list.每次提取新的 A1 成绩时,请将其附加到此列表中。

So far this is what I've have done:到目前为止,这就是我所做的:

file = open('grades.txt','r')
s = file.readline()


for line in file:
    newline = str(line)
    grades = newline.split(",")
    if len(grades)<=4:
        break
    elif len(grades)>5:
        break
    else:
        grades = [float(x) for x in grades]
gradeA1 = grades[5]
print(gradeA1)

However I only get the first grade 6 and not the other A1 grades for any consecutive lines, all the A1 grades should be compiled into a list.然而,我只得到第一级6级,而不是任何连续行的其他 A1 级,所有 A1 级应编入一个列表。

I have this as my edited code but I still get an error.我将此作为我编辑过的代码,但仍然出现错误。

file = open('grades.txt','r')
s = file.readline()

for s in file:
    s = file.readline()
    grades = s.split(",")
    if grades=='EOF\n':
        break
A1grades = [float(x) for x in grades[5]]   
print(A1grades)

I get an index out of range error.我得到一个索引超出范围错误。

For any well-formed data, the csv module is a good place to start - I suggest you have a read of the documentation for that, and give it a try.对于任何格式良好的数据,csv 模块是一个很好的起点 - 我建议您阅读相关文档,并尝试一下。 Should get you moving in the right direction.应该让你朝着正确的方向前进。 Otherwise, I suspect you've got some confusion on what your list is - a list of results from the most recent line, or a list of lines.否则,我怀疑你对你的列表是什么有一些困惑——最近一行的结果列表或行列表。 The code currently re-creates grades on each line, which may not be what you're trying to do...该代码目前在每一行上重新创建成绩,这可能不是您想要做的......

I think the problem could be that you are not reading all lines from the file... maybe you could do something like this我认为问题可能是你没有从文件中读取所有行......也许你可以做这样的事情

firstLine = file.readline()
#extract from first line, the number of lines that next

for x in range(1,number_of_line)
    line = file.readline()
    #process the information for all next lines

This is a way for do it, hope this could helps you...这是一种方法,希望这可以帮助您...

I could be mistaken, but, with this input:我可能会弄错,但是,有了这个输入:

ID,       Last,      First,   Lecture, Tutorial, A1,  A2, A3,   A4,  A5
10034567, Smith,     Winston, L01,     T03,      6,   5.5, 8,   10,  8.5
10045678, Lee,       Bruce,   L02,     T05,      4.5, 6.5, 7,   7,   8.5
00305678, Obama,     Jack,    L01,     T05,      10,  10,  9,   9.5, 10

and this (part of the) code:和这个(部分)代码:

for line in file:
    newline = str(line)
    grades = newline.split(",")
    if len(grades)<=4:
        break
    elif len(grades)>5:
        break

you only keep looping when there are exactly 5 columns.只有当正好有 5 列时,您才继续循环。 I count 10 columns (split by a comma).我数了 10 列(用逗号分隔)。 So, you immediately break after the first line (never converting the grades to a float either), and just get the results from the first line only.因此,您在第一行之后立即中断(也不要将成绩转换为浮点数),并且仅从第一行获取结果。

There are several other mistakes in your code, but you may want to fix this first.您的代码中还有其他几个错误,但您可能需要先解决这个问题。

Btw, I assume this is from a Python course (seen the phrasing in your question), so I take it you're just learning the basics.顺便说一句,我认为这是来自 Python 课程(请参阅您问题中的措辞),所以我认为您只是在学习基础知识。 If not or you want to do this better, I'd go with KevinL and use the CSV module .如果没有或者你想做得更好,我会选择 KevinL 并使用CSV 模块

edit编辑

From the new code (in the edited question):从新代码(在编辑过的问题中):

  1. you don't have to check for EOF .您不必检查EOF Loooping through the file ( for s in file ) does that for you.循环遍历文件( for s in file )可以为您做到这一点。

  2. for s in file already reads a line. for s in file已经读取了一行。 No need for s = file.readline() again.不需要s = file.readline()了。

  3. For each splitted line, find the grade.对于每条分割线,找到等级。 Append this to a different list that you created (empty) at the start.将此附加到您在开始时创建的不同列表(空)。 This list contains your A1 grades.此列表包含您的 A1 成绩。

    grades = [] for s in file: columns = s.split(",") grades.append(columns[5]) Grades = [] for s in file: columns = s.split(",") grades.append(columns[5])

But: I think you need to step back, write very carefully what either your code is doing (or my code), or what needs to be done (not in code, in words, step by step).但是:我认为你需要退后一步,非常仔细地写下你的代码正在做什么(或我的代码),或者需要做什么(不是在代码中,用文字,一步一步)。 There is a lot you are missing, programming wise (not even Python wise).你缺少很多东西,编程明智(甚至不是 Python 明智的)。

If this is homework, perhaps it's better to discuss it with other people who do this.如果这是家庭作业,也许最好与其他做这件事的人讨论一下。 Also, there is the Python tutor mailing list , which may be more suited.此外,还有可能更适合的Python 导师邮件列表

No offense, just trying to give some practical advice.无意冒犯,只是想提供一些实用的建议。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM