简体   繁体   中英

Pandas Read_Parquet NaN error: ValueError: cannot convert float NaN to integer

I need help in handling the NaN error I am getting in reading a parquet file in Python3. There is a column returning " NaN " while I try to read the file. Per requirement, I cannot change the content of this file - So I cannot go in and handle the offending column.

Here is the code I am using to read the file:

df = pd.read_parquet("parquet_file.parquet")

Here is the error:

ValueError: cannot convert float NaN to integer

Thanks.

Having come across a similar issue, I found the solution (in my case) was to pip install pyarrow. The documentation for read_parquet mentions the engine argument. auto is default, using the fastparquet library if pyarrow is not available.

pip installing pyarrow then rerunning the code seems to solve the issue

pandas.read_parquet(parquet_file.parquet, columns=[column1,column2])

You can specify the list of columns you want to process from the file exluding the column which is having issue

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM