简体   繁体   中英

modin pandas read_parquet() failed on ETag KeyError trying to read a partitioned parquet from s3

I created a dataframe from pandas and used to_parquet(...) to write to s3 directly.

arguments are:

df.to_parquet('s3://bucket/fn.parquet', compression='gzip', engine='fastparquet', partition_cols=['col1'])

when I use pandas's pandas.read_parquet(url) , the dataframe is loaded fine.

But when I use modin.pandas.read_parquet(url) , I get following error:

 File "/home/mguo/anaconda3/envs/testenv/lib/python3.7/site-packages/s3fs/core.py", line 1779, in __init__
    self.req_kw["IfMatch"] = self.details["ETag"]
KeyError: 'ETag'

Below are my version:

python==3.7.3
pandas==1.2.4
modin==0.10.0
s3fs==2021.6.0

This issue was tracked on GitHub here and fixed here .

Another user posted a link to the GitHub issue in an answer here, but it was deleted. Mods, if you see this post, please don't deleted.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM