简体   繁体   English

在csv.reader中获取原始行?

[英]Get raw line in csv.reader?

I'm making a wrapper around csv.reader that will let the reader keep working even if it encounters a malformed line (ie one that has a NULL byte). 我正在围绕csv.reader创建一个包装器,即使它遇到格式错误的行(即具有NULL字节的行),也会让读者继续工作。 It looks like: 看起来像:

def error_ignoring_csv_reader(csv_reader):
    while True:
        try:
            yield next(csv_reader)
        except csv.Error, e:
            logger.warning("Got badly formed line with error [%s]" % e)

What I'd really like to do is include the raw problematic line as part of the logged warning "Got badly formed line [actual_raw_line] ", but reading over csv's source code, I haven't found any way of accessing it. 我真正想要做的是将原始有问题的行包含在已记录的警告“得到错误的行[actual_raw_line] ”中,但是阅读csv的源代码,我还没有找到任何访问它的方法。 Is it possible to access the raw, unprocessed current line we're on in csv.reader? 是否可以访问我们在csv.reader中处理的原始未处理的当前行?

Although I am not aware of a way to access the raw, unprocessed current line directly in csv.reader, these objects do make available a csvreader.line_num attribute that can provide a foundation for obtaining the current line. 虽然我不知道直接在csv.reader中访问原始的,未处理的当前行的方法,但这些对象确实提供了csvreader.line_num属性,该属性可以为获取当前行提供基础。

The csvfile argument to csv.reader "can be any object which supports the iterator protocol and returns a string each time its next() method is called", so it does not have to be a plain file. csv.reader的csvfile参数“可以是任何支持迭代器协议的对象,每次调用next()方法时都返回一个字符串”,因此它不必是普通文件。 The reference documentation specifically mentions a list of strings as an option. 参考文档特别提到了一个字符串列表作为选项。

If you can first read the file into a list, you can use the line_num attribute to index into the list in case of an error. 如果您可以先将文件读入列表,则可以使用line_num属性在出现错误时将其索引到列表中。 Or you could go back and re-read lines from the file to find the problematic line. 或者您可以返回并重新读取文件中的行以查找有问题的行。

Or you could build a customized iterator that reads from the file and also remembers the last line read. 或者您可以构建一个自定义迭代器,从文件中读取并记住最后一行读取。 (With this last approach your code would create your special iterator and pass that to csv.reader. You should not even need to use the line_num attribute in that case.) (使用最后一种方法,您的代码将创建您的特殊迭代器并将其传递给csv.reader。在这种情况下,您甚至不需要使用line_num属性。)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM