简体   繁体   English

pandas.read_csv 在带有额外列的坏行上不会出错

[英]pandas.read_csv not erroring on a bad line with extra columns

As far as I can tell, this only occurs if the bad line is the first line of data据我所知,只有当坏行是第一行数据时才会发生这种情况

I have a simple csv file like so:我有一个简单的 csv 文件,如下所示:

ownername,streetno,streetname
me,320,main st,just,absolute,garbage
you,40,mint ave

The command I'm using to read the file is我用来读取文件的命令是

read_csv(file,',',header=0, quotechar=None, quoting = csv.QUOTE_NONE, index_col=False)

As long as the extra values (just,absolute,garbage) occur on the first row of data, it will parse the file without errors, giving me the below DataFrame只要第一行数据出现额外的值(只是,绝对,垃圾),它就会解析文件而不会出错,给我下面的 DataFrame

  ownername  streetno streetname
0        me       320    main st
1       you        40   mint ave

That's not the worst result, but for what I'm working on, I'd prefer to error on any mismatch between the number of column headers and the number of data columns.这不是最糟糕的结果,但对于我正在处理的工作,我更愿意在列标题数量和数据列数量之间出现任何不匹配时出错。 Setting error_bad_lines=True had no effect.设置error_bad_lines=True没有效果。

Am I missing something here?我在这里错过了什么吗? Is this intended behavior?这是预期的行为吗? If it is intended behavior, is there any way to bypass it or make it more strict?如果这是预期的行为,有没有办法绕过它或使其更严格?

error_bad_lines is true by default which is what causes the exception to be raised. error_bad_lines默认为 true,这就是引发异常的原因。 If you set it to False , it will skip the erroneous lines. 如果将其设置为 False ,它将跳过错误的行。

I have also found from my testing that truncation of bad data only occurs on the first line.我还从我的测试中发现,坏数据的截断只发生在第一行。 May be worth creating an issue .可能值得创建一个问题

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM