简体   繁体   中英

Get raw line in csv.reader?

I'm making a wrapper around csv.reader that will let the reader keep working even if it encounters a malformed line (ie one that has a NULL byte). It looks like:

def error_ignoring_csv_reader(csv_reader):
    while True:
        try:
            yield next(csv_reader)
        except csv.Error, e:
            logger.warning("Got badly formed line with error [%s]" % e)

What I'd really like to do is include the raw problematic line as part of the logged warning "Got badly formed line [actual_raw_line] ", but reading over csv's source code, I haven't found any way of accessing it. Is it possible to access the raw, unprocessed current line we're on in csv.reader?

Although I am not aware of a way to access the raw, unprocessed current line directly in csv.reader, these objects do make available a csvreader.line_num attribute that can provide a foundation for obtaining the current line.

The csvfile argument to csv.reader "can be any object which supports the iterator protocol and returns a string each time its next() method is called", so it does not have to be a plain file. The reference documentation specifically mentions a list of strings as an option.

If you can first read the file into a list, you can use the line_num attribute to index into the list in case of an error. Or you could go back and re-read lines from the file to find the problematic line.

Or you could build a customized iterator that reads from the file and also remembers the last line read. (With this last approach your code would create your special iterator and pass that to csv.reader. You should not even need to use the line_num attribute in that case.)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM