简体   繁体   中英

Data extraction from text file

I have the following input:

ID,       Last,      First,   Lecture, Tutorial, A1,  A2, A3,   A4,  A5
10034567, Smith,     Winston, L01,     T03,      6,   5.5, 8,   10,  8.5
10045678, Lee,       Bruce,   L02,     T05,      4.5, 6.5, 7,   7,   8.5
00305678, Obama,     Jack,    L01,     T05,      10,  10,  9,   9.5, 10
00567890, Brown,     Palin,   L02,     T03,      4,   7.5, 6.5, 0,   5
10012134, Harper,    Ed,      L01,     T03,      10,  9,   7.5, 10,  6.5
10014549, Johnson,   Andrew,  L01,     T05,      10,  0,   10,  5.5, 7
10020987, Clockwork, Milan,   L02,     T03,      10,  8.5, 8,   9,   9
10021234, Freeman,   Skyski   L01,     T02,      0,   10,  10,  10,  8.5
EOF

The first line of the file explains each column of the data. Let n be the total number of students, then the next n lines of the file each corresponds to a student in the class, and contains 10 fields:

  1. Student ID

  2. Last name

  3. First name

  4. Lecture section

  5. Tutorial sectiom

  6. Grades for Assignments 1 (and so on...)

Assuming the grades are stored in a file grades.txt then you can read an entire line of the file into a Python string s by using the following Python statements:

file = open (‘grades.txt’, ‘r’)
s = file.readline()

You just need to open the file once, then you can use the readline() function multiple times, to read a successive line each time. After the n lines of student records, the file ends with a last line that says EOF short for End of File .

The number n is not known a priority. The sample input doesnt matter it can contain from 100 to 300 students, in a file named grades.txt We wish to eventually draw a histogram for grade distribution of Assignment 1. Therefore you need to extract the grade of A1 for each student, by processing his/her corresponding line in the file. Construct a list that will have one entry for each student, storing his/her A1 grade. Each time you extract a new A1 grade, append it to this list.

So far this is what I've have done:

file = open('grades.txt','r')
s = file.readline()


for line in file:
    newline = str(line)
    grades = newline.split(",")
    if len(grades)<=4:
        break
    elif len(grades)>5:
        break
    else:
        grades = [float(x) for x in grades]
gradeA1 = grades[5]
print(gradeA1)

However I only get the first grade 6 and not the other A1 grades for any consecutive lines, all the A1 grades should be compiled into a list.

I have this as my edited code but I still get an error.

file = open('grades.txt','r')
s = file.readline()

for s in file:
    s = file.readline()
    grades = s.split(",")
    if grades=='EOF\n':
        break
A1grades = [float(x) for x in grades[5]]   
print(A1grades)

I get an index out of range error.

For any well-formed data, the csv module is a good place to start - I suggest you have a read of the documentation for that, and give it a try. Should get you moving in the right direction. Otherwise, I suspect you've got some confusion on what your list is - a list of results from the most recent line, or a list of lines. The code currently re-creates grades on each line, which may not be what you're trying to do...

I think the problem could be that you are not reading all lines from the file... maybe you could do something like this

firstLine = file.readline()
#extract from first line, the number of lines that next

for x in range(1,number_of_line)
    line = file.readline()
    #process the information for all next lines

This is a way for do it, hope this could helps you...

I could be mistaken, but, with this input:

ID,       Last,      First,   Lecture, Tutorial, A1,  A2, A3,   A4,  A5
10034567, Smith,     Winston, L01,     T03,      6,   5.5, 8,   10,  8.5
10045678, Lee,       Bruce,   L02,     T05,      4.5, 6.5, 7,   7,   8.5
00305678, Obama,     Jack,    L01,     T05,      10,  10,  9,   9.5, 10

and this (part of the) code:

for line in file:
    newline = str(line)
    grades = newline.split(",")
    if len(grades)<=4:
        break
    elif len(grades)>5:
        break

you only keep looping when there are exactly 5 columns. I count 10 columns (split by a comma). So, you immediately break after the first line (never converting the grades to a float either), and just get the results from the first line only.

There are several other mistakes in your code, but you may want to fix this first.

Btw, I assume this is from a Python course (seen the phrasing in your question), so I take it you're just learning the basics. If not or you want to do this better, I'd go with KevinL and use the CSV module .

edit

From the new code (in the edited question):

  1. you don't have to check for EOF . Loooping through the file ( for s in file ) does that for you.

  2. for s in file already reads a line. No need for s = file.readline() again.

  3. For each splitted line, find the grade. Append this to a different list that you created (empty) at the start. This list contains your A1 grades.

    grades = [] for s in file: columns = s.split(",") grades.append(columns[5])

But: I think you need to step back, write very carefully what either your code is doing (or my code), or what needs to be done (not in code, in words, step by step). There is a lot you are missing, programming wise (not even Python wise).

If this is homework, perhaps it's better to discuss it with other people who do this. Also, there is the Python tutor mailing list , which may be more suited.

No offense, just trying to give some practical advice.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM