[英]Load Pandas Dataframe to S3 passing s3_additional_kwargs
Please excuse my ignorance / lack of knowledge in this area!请原谅我在这方面的无知/缺乏知识!
I'm looking to upload a dataframe to S3, but I need to pass 'ACL':'bucket-owner-full-control'.我希望将 dataframe 上传到 S3,但我需要传递“ACL”:“bucket-owner-full-control”。
import pandas as pd
import s3fs
fs = s3fs.S3FileSystem(anon=False, s3_additional_kwargs={'ACL': 'bucket-owner-full-control'})
df = pd.DataFrame()
df['test'] = [1,2,3]
df.head()
df.to_parquet('s3://path/to/file/df.parquet', compression='gzip')
I have managed to get around this by then loading this to a Pyarrow table and the loading like:我设法解决了这个问题,然后将其加载到 Pyarrow 表并加载如下:
import pyarrow.parquet as pq
table = pa.Table.from_pandas(df)
pq.write_to_dataset(table=table,
root_path='s3://path/to/file/',
filesystem=fs)
But this feels hacky and I feel there must be a way to pass the ACL in the first example.但这感觉很老套,我觉得必须有一种方法可以在第一个示例中传递 ACL。
你能行的 :
pd.to_parquet('name.parquet',storage_options={"key":xxxxx,"secret":gcp_secret_access_key,'xxxxx':{'ACL': 'bucket-owner-full-control'}})
With Pandas 1.2.0, there is storage_options
as mentioned here.对于 Pandas 1.2.0,这里提到了
storage_options
。
If you are stuck with Pandas < 1.2.0 (1.1.3 in my case), this trick did help:如果你坚持使用 Pandas < 1.2.0(在我的例子中是 1.1.3),这个技巧确实有帮助:
storage_options = dict(anon=False, s3_additional_kwargs=dict(ACL="bucket-owner-full-control"))
import s3fs
fs = s3fs.S3FileSystem(**storage_options)
df.to_parquet('s3://foo/bar.parquet', filesystem=fs)
As mentioned before, with Pandas 1.2.0 there is a storage_options
argument to most writer functions ( to_csv
, to_parquet
, etc.).如前所述,对于 Pandas 1.2.0,大多数编写器函数(
to_csv
、 to_parquet
等)都有一个storage_options
参数。 To set the ACL when writing to S3 (in this case the file system backend that is used is s3fs
) you can use this example:要在写入 S3 时设置 ACL(在本例中使用的文件系统后端是
s3fs
),您可以使用以下示例:
ACL = dict(storage_options=dict(s3_additional_kwargs=dict(ACL='bucket-owner-full-control')))
import pandas as pd
df = pd.DataFrame({"column": [1,2,3,4]})
df.to_parquet("s3://bucket/file.parquet", **ACL)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.