As far as I can tell, this only occurs if the bad line is the first line of data
I have a simple csv file like so:
ownername,streetno,streetname
me,320,main st,just,absolute,garbage
you,40,mint ave
The command I'm using to read the file is
read_csv(file,',',header=0, quotechar=None, quoting = csv.QUOTE_NONE, index_col=False)
As long as the extra values (just,absolute,garbage) occur on the first row of data, it will parse the file without errors, giving me the below DataFrame
ownername streetno streetname
0 me 320 main st
1 you 40 mint ave
That's not the worst result, but for what I'm working on, I'd prefer to error on any mismatch between the number of column headers and the number of data columns. Setting error_bad_lines=True
had no effect.
Am I missing something here? Is this intended behavior? If it is intended behavior, is there any way to bypass it or make it more strict?
error_bad_lines
is true by default which is what causes the exception to be raised.
If you set it to
False
, it will skip the erroneous lines.
I have also found from my testing that truncation of bad data only occurs on the first line. May be worth creating an issue .
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.