简体   繁体   中英

Extracting .csv file to 2D list without using any library

As a part of an Assignment, I have to extract a.csv file without using any library. Top 3 elements are as follows:-

"ID","Name","Sex","Age","Height","Weight","Team","NOC","Games","Year","Season","City","Sport","Event","Medal"
"1","A Dijiang","M",24,180,80,"China","CHN","1992 Summer",1992,"Summer","Barcelona","Basketball","Basketball Men's Basketball",NA
"2","A Lamusi","M",23,170,60,"China","CHN","2012 Summer",2012,"Summer","London","Judo","Judo Men's Extra-Lightweight",NA
"3","Gunnar Nielsen Aaby","M",24,NA,NA,"Denmark","DEN","1920 Summer",1920,"Summer","Antwerpen","Football","Football Men's Football",NA

I tried to implement it as follows:

csv_data = []
with open('olympic.csv') as csv_file:
    for line in csv_file:
        line = line.strip()
        line = line.split(',')
        temp = []
        for element in line:
            if element[0] == '"' or element[-1] == '"':
                temp.append(element[1 : -1])
            else:
                temp.append(element)
        csv_data.append(temp)

This give approximately right answer but problem is when Name and Event column contains "," character in it, For example

"," in Name column
"5965","Dionisio Augustine, II","M",24,153,65,"Federated States of Micronesia","FSM","2016 Summer",2016,"Summer","Rio de Janeiro","Swimming","Swimming Men's 50 metres Freestyle",NA
"7208","Carlos Zenon Balderas, Jr.","M",19,175,60,"United States","USA","2016 Summer",2016,"Summer","Rio de Janeiro","Boxing","Boxing Men's Lightweight",NA

"," in Event column
"2304","Michael Albasini","M",31,172,67,"Switzerland","SUI","2012 Summer",2012,"Summer","London","Cycling","Cycling Men's Road Race, Individual",NA
"250","Saeid Morad Abdevali","M",22,170,80,"Iran","IRI","2012 Summer",2012,"Summer","London","Wrestling","Wrestling Men's Welterweight, Greco-Roman",NA

Is there any proper method to solve this problem without using standard libraries?

Yeah... then maybe you will have to cope with escaped quoting character, then (why not?) with newline in a column...

That's why, in real life, the best strategy is to use a library, instead of reinventig the wheel (a whole complicated clockwork, in fact.)

You may try to use a regular expression to catch a column value. For quoted columns, a naive one could be something like '"([^"]+)"'; for not quoted ones (numbers?) maybe with lookaraounds: '(?<,)(\d+)(?=,)'... then trying to put everything together.

Or (being a class assignement, efficiency and speed are perhaps not compulsory) you might write a state machine: reading a character at time, and act accordingly: if it's a '"' go on reading up to another '"', otherwise read on up to the next comma, and so on...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM