modin pandas read_parquet() failed on ETag KeyError trying to read a partitioned parquet from s3

Question

I created a dataframe from pandas and used to_parquet(...) to write to s3 directly.

arguments are:

df.to_parquet('s3://bucket/fn.parquet', compression='gzip', engine='fastparquet', partition_cols=['col1'])

when I use pandas's pandas.read_parquet(url) , the dataframe is loaded fine.

But when I use modin.pandas.read_parquet(url) , I get following error:

 File "/home/mguo/anaconda3/envs/testenv/lib/python3.7/site-packages/s3fs/core.py", line 1779, in __init__
    self.req_kw["IfMatch"] = self.details["ETag"]
KeyError: 'ETag'

Below are my version:

python==3.7.3
pandas==1.2.4
modin==0.10.0
s3fs==2021.6.0

Answer 1

This issue was tracked on GitHub here and fixed here .

Another user posted a link to the GitHub issue in an answer here, but it was deleted. Mods, if you see this post, please don't deleted.

modin pandas read_parquet() failed on ETag KeyError trying to read a partitioned parquet from s3

Question

1 answers

solution1
0 2021-12-08 18:34:37

modin pandas read_parquet() failed on ETag KeyError trying to read a partitioned parquet from s3

Question

1 answers

solution1 0 2021-12-08 18:34:37

solution1
0 2021-12-08 18:34:37