简体   繁体   中英

how to skip / ignore skip null byte in csv file using pd.read_csv?

I have a .csv file that has hundreds of lines/columns that look like this (small example, see image I couldnt copy/paste the null bytes had to type them manually): 在此处输入图片说明

9142,16.04000000,14.65000000
<0x00><0x00><0x00>
9143,16.19000000,14.65000000

there are a small number of lines that contain NULL bytes ("<0x00>") that are giving me trouble when trying to read the csv using pandas pd.read_csv.

when I run the command :

pd.read_csv(fname, header=None, na_values='-32768', names=binnams, engine='python')

I get the following error:

pandas.errors.ParserError: ("NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead", 'occurred at index 16')

and when I switch the engine='c' I get:

TypeError: ('cannot unpack non-iterable NoneType object', 'occurred at index 16')

Is there a way to ignore these lines completely using pd.read_csv?

I think a workaround might be to open the files and loop through them and delete any lines that contain the <0x00> if it can even be read?

Any thoughts/suggestions are definitely appreciated.

EDIT - tried to read the files line by line to see if I could delete these lines but not sure how to actually capture the null byte (using "<0x00>" obv didn't work :D )

link to example file here : https://drive.google.com/open?id=1uEjMv0Be9Hu_AqXRzqB3enrWilzCTBvc

尝试将csv文件另存为UTF-16,然后尝试运行代码:

pd.read_csv(fname, header=None, na_values='-32768', names=binnams, engine='python')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM