简体   繁体   中英

Reading csv file in python

With the following data, using the code snippet, I am getting the following error. Can you please help me with this. I am a beginner in python. Data :

"Id","Title","Body","Tags"
"Id1","Tit,le1","Body1","Ta,gs1"
"Id","Title","Body","Ta,2gs"

Code:

#!/usr/bin/python 
import csv,sys
if len(sys.argv) <> 3:
print >>sys.stderr, 'Wrong number of arguments. This tool will print first n records from a comma separated CSV file.' 
print >>sys.stderr, 'Usage:' 
print >>sys.stderr, '       python', sys.argv[0], '<file> <number-of-lines>'
sys.exit(1)

fileName = sys.argv[1]
n = int(sys.argv[2])

i = 0 
out = csv.writer(sys.stdout, delimiter=',', quotechar='"', quoting=csv.QUOTE_NONNUMERIC)

ret = []


def read_csv(file_path, has_header = True):
    with open(file_path) as f:
        if has_header: f.readline()
        data = []
        for line in f:
            line = line.strip().split("\",\"")
            data.append([x for x in line])
    return data


ret = read_csv(fileName)
target = []
train = []
target = [x[2] for x in ret]
train = [x[1] for x in ret]

Error:

    target = [x[2] for x in ret]
IndexError: list index out of range

You are mixing file.readline() and using the file object as an iterable. Don't do that. Use next() instead.

You also should use the csv.reader() module to read your data, don't reinvent this wheel. The csv module can handle quoted CSV values with delimiters embedded in thevalues much better in any case:

import csv

def read_csv(file_path, has_header=True):
    with open(file_path, 'rb') as f:
        reader = csv.reader(f)
        if has_header: next(reader, None)
        return list(reader)

Last but not least, you can use zip() to transpose rows and columns:

ret = read_csv(fileName)
target, train = zip(*ret)[1:3]  # just the 2nd and 3rd columns

Here zip() will stop at the first row where there are not enough columns, at the very least avoiding the exception you see.

If there are columns missing in some of the rows, use itertools.izip_longest() instead ( itertools.zip_longest() in Python 3):

from itertools import izip_longest

ret = read_csv(fileName)
target, train = izip_longest(*ret)[1:3]  # just the 2nd and 3rd columns

The default is to replace missing columns with None ; if you need to use a different value, pass a fillvalue argument to izip_longest() :

target, train = izip_longest(*ret, fillvalue=0)[1:3]  # just the 2nd and 3rd columns

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM