简体   繁体   English

在python中读取csv文件

[英]Reading csv file in python

With the following data, using the code snippet, I am getting the following error. 使用以下代码段,使用以下数据,出现以下错误。 Can you please help me with this. 你能帮我这个忙吗? I am a beginner in python. 我是python的初学者。 Data : 资料:

"Id","Title","Body","Tags"
"Id1","Tit,le1","Body1","Ta,gs1"
"Id","Title","Body","Ta,2gs"

Code: 码:

#!/usr/bin/python 
import csv,sys
if len(sys.argv) <> 3:
print >>sys.stderr, 'Wrong number of arguments. This tool will print first n records from a comma separated CSV file.' 
print >>sys.stderr, 'Usage:' 
print >>sys.stderr, '       python', sys.argv[0], '<file> <number-of-lines>'
sys.exit(1)

fileName = sys.argv[1]
n = int(sys.argv[2])

i = 0 
out = csv.writer(sys.stdout, delimiter=',', quotechar='"', quoting=csv.QUOTE_NONNUMERIC)

ret = []


def read_csv(file_path, has_header = True):
    with open(file_path) as f:
        if has_header: f.readline()
        data = []
        for line in f:
            line = line.strip().split("\",\"")
            data.append([x for x in line])
    return data


ret = read_csv(fileName)
target = []
train = []
target = [x[2] for x in ret]
train = [x[1] for x in ret]

Error: 错误:

    target = [x[2] for x in ret]
IndexError: list index out of range

You are mixing file.readline() and using the file object as an iterable. 您正在混合file.readline()并将文件对象用作可迭代对象。 Don't do that. 不要那样做 Use next() instead. 使用next()代替。

You also should use the csv.reader() module to read your data, don't reinvent this wheel. 您还应该使用csv.reader()模块读取数据,而不要csv.reader() The csv module can handle quoted CSV values with delimiters embedded in thevalues much better in any case: 在任何情况下, csv模块都可以更好地处理带引号的CSV值,并在值中嵌入定界符:

import csv

def read_csv(file_path, has_header=True):
    with open(file_path, 'rb') as f:
        reader = csv.reader(f)
        if has_header: next(reader, None)
        return list(reader)

Last but not least, you can use zip() to transpose rows and columns: 最后但并非最不重要的一点是,您可以使用zip()转置行和列:

ret = read_csv(fileName)
target, train = zip(*ret)[1:3]  # just the 2nd and 3rd columns

Here zip() will stop at the first row where there are not enough columns, at the very least avoiding the exception you see. 此处的zip()将在第一列没有足够多的地方停止,至少要避免您看到的异常。

If there are columns missing in some of the rows, use itertools.izip_longest() instead ( itertools.zip_longest() in Python 3): 如果某些行中缺少列,请改用itertools.izip_longest() (Python 3中为itertools.zip_longest() ):

from itertools import izip_longest

ret = read_csv(fileName)
target, train = izip_longest(*ret)[1:3]  # just the 2nd and 3rd columns

The default is to replace missing columns with None ; 默认值是将None替换为缺少的列; if you need to use a different value, pass a fillvalue argument to izip_longest() : 如果需要使用其他值, fillvalue参数传递给izip_longest()

target, train = izip_longest(*ret, fillvalue=0)[1:3]  # just the 2nd and 3rd columns

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM