pandas.read_csv not erroring on a bad line with extra columns

Question

As far as I can tell, this only occurs if the bad line is the first line of data

I have a simple csv file like so:

ownername,streetno,streetname
me,320,main st,just,absolute,garbage
you,40,mint ave

The command I'm using to read the file is

read_csv(file,',',header=0, quotechar=None, quoting = csv.QUOTE_NONE, index_col=False)

As long as the extra values (just,absolute,garbage) occur on the first row of data, it will parse the file without errors, giving me the below DataFrame

  ownername  streetno streetname
0        me       320    main st
1       you        40   mint ave

That's not the worst result, but for what I'm working on, I'd prefer to error on any mismatch between the number of column headers and the number of data columns. Setting error_bad_lines=True had no effect.

Am I missing something here? Is this intended behavior? If it is intended behavior, is there any way to bypass it or make it more strict?

Answer 1

~~error_bad_lines is true by default which is what causes the exception to be raised.~~ ~~If you set it to False , it will skip the erroneous lines.~~

I have also found from my testing that truncation of bad data only occurs on the first line. May be worth creating an issue .

pandas.read_csv not erroring on a bad line with extra columns

Question

1 answers

solution1
0 ACCPTED 2021-03-09 09:14:40

pandas.read_csv not erroring on a bad line with extra columns

Question

1 answers

solution1 0 ACCPTED 2021-03-09 09:14:40

solution1
0 ACCPTED 2021-03-09 09:14:40