简体   繁体   中英

pandas.read_csv not erroring on a bad line with extra columns

As far as I can tell, this only occurs if the bad line is the first line of data

I have a simple csv file like so:

ownername,streetno,streetname
me,320,main st,just,absolute,garbage
you,40,mint ave

The command I'm using to read the file is

read_csv(file,',',header=0, quotechar=None, quoting = csv.QUOTE_NONE, index_col=False)

As long as the extra values (just,absolute,garbage) occur on the first row of data, it will parse the file without errors, giving me the below DataFrame

  ownername  streetno streetname
0        me       320    main st
1       you        40   mint ave

That's not the worst result, but for what I'm working on, I'd prefer to error on any mismatch between the number of column headers and the number of data columns. Setting error_bad_lines=True had no effect.

Am I missing something here? Is this intended behavior? If it is intended behavior, is there any way to bypass it or make it more strict?

error_bad_lines is true by default which is what causes the exception to be raised. If you set it to False , it will skip the erroneous lines.

I have also found from my testing that truncation of bad data only occurs on the first line. May be worth creating an issue .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM