简体   繁体   中英

csv.reader misses first line

I am using csv.reader in python to read a csv file into a dictionary. The first column of the csv is a date (in one of 2 possible formats) which is read in as a datetime object and becomes the key of the dict , and I also read columns 3 and 4:

import datetime as dt
import csv
with open(fileInput,'r') as inFile:
    csv_in = csv.reader(inFile)
    try:
        dictData = {(dt.datetime.strptime(rows[0], '%d/%m/%Y %H:%M')): [rows[3], rows[4]]
                        for rows in csv_in}
    except:
        dictData = {(dt.datetime.strptime(rows[0], '%Y-%m-%d %H:%M:%S')): [rows[3], rows[4]]
                        for rows in csv_in}

It works, except that the first date in the file ( 1/7/2012 00:00 ) doesn't appear in the dictionary. Do I need to tell csv.reader that the first row is not a header row and if so, how?

When you run your try , except statement, it is easy to believe that python will first try something, and if that fails, revert your environment back to the state it was in before the try statement was executed. It does not do this. As such, you have to be aware of unintended side effects that might occur from a failed try attempt.

What is happening in your case is that the dictionary comprehension calls next(...) on your csv.reader() object ( csv_in ), which returns the first line in the csv file. You have now used up the first item from the csv.reader() iterator. Remember, Python won't revert to a previous state if the try block fails.

An exception is then raised, I'm presuming when the date is in the wrong format. When the except block then takes over, and calls next(...) on your csv_in object, you then get the second item in the iterator. The first has already been used.

A simple change to get around this is to make a copy of the csv iterator object.

import datetime as dt
import csv
from copy import copy
with open(fileInput,'r') as inFile:
    csv_in = csv.reader(inFile)
    try:
        dictData = {(dt.datetime.strptime(rows[0],'%d/%m/%Y %H:%M')):
                      [rows[3],rows[4]] for rows in copy(csv_in)}
    except ValueError:
        dictData = {(dt.datetime.strptime(rows[0],'%Y-%m-%d %H:%M:%S')):
                      [rows[3],rows[4]] for rows in copy(csv_in)}

Finally, I would recommend against catching a generic Exception . I think you would be wanting to catch a ValueError .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM