简体   繁体   中英

Write pandas dataframe to parquet in s3 AWS

I want to write my dataframe in my s3 bucket in a parquet format. I know how to write the dataframe in a csv format. But I don't know how to write in parquet format. Here is the code for the csv format (I don't display the fields ServerSideEncryption and SSEKMSKeyId but I use them in my actual code ) :

csv_to_write = df.to_csv(None).encode()
s3_client.put_object(Bucket=bucket_name,Key='data.csv', Body=csv_to_write,
              ServerSideEncryption='XXXXX', SSEKMSKeyId='XXXXXXXX')

Does someone have the equivalent for parquet ? Thanks

For python 3.6+, AWS has a library called aws-data-wrangler that helps with the integration between Pandas/S3/Parquet

to install do;

pip install awswrangler

if you want to write your pandas dataframe as a parquet file to S3 do;

import awswrangler as wr
wr.s3.to_parquet(
    dataframe=df,
    path="s3://my-bucket/key/my-file.parquet"
)

if you want to add encryption do;

import awswrangler as wr
extra_args = {
    "ServerSideEncryption": "aws:kms",
    "SSEKMSKeyId": "YOUR_KMS_KEY_ARN"
}
sess = wr.Session(s3_additional_kwargs=extra_args)
sess.s3.to_parquet(
    dataframe=df,
    path="s3://my-bucket/key/my-file.parquet"
)

Assuming your dataframe is called df, use the following code to first convert it to parquet format and store it. Then upload this parquet file on s3.

import pyarrow as pa
import pyarrow.parquet as pq
import boto3

parquet_table = pa.Table.from_pandas(df)
pq.write_table(parquet_table, local_file_name)

s3 = boto3.client('s3',aws_access_key_id='XXX',aws_secret_access_key='XXX')
s3.upload_file(local_file_name, bucket_name, remote_file_name)

Excellent solution above with use of AWS Wrangler, but I did get an error when I attempted to use example above, assuming the lib has changed. The following worked for me:

wr.s3.to_parquet(df, path=f"s3://{output_bucket}/{output_key}.parquet", index=False)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM