简体   繁体   中英

Upload Gzip file using Boto3

i am trying to upload files to S3 before that i am trying to Gzip files, if you see the code below, the files uploaded to the S3 have no change in the size, so i am trying to figure out if i have missed something.

import gzip
import shutil
from io import BytesIO


def upload_gzipped(bucket, key, fp, compressed_fp=None, content_type='text/plain'):
    """Compress and upload the contents from fp to S3.

    If compressed_fp is None, the compression is performed in memory.
    """
    if not compressed_fp:
        compressed_fp = BytesIO()
    with gzip.GzipFile(fileobj=compressed_fp, mode='wb') as gz:
        shutil.copyfileobj(fp, gz)
    compressed_fp.seek(0)
    bucket.upload_fileobj(
        compressed_fp,
        key,
        {'ContentType': content_type, 'ContentEncoding': 'gzip'})

Courtesy Link for the source

And this is how i am using this fucntion, so basically reading files as stream from SFTP and then trying to Gzip them and then write them to S3.

with pysftp.Connection(host_name, username=user, password=password, cnopts=cnopts, port=int(port)) as sftp:
    list_of_files = sftp.listdir('{}{}'.format(base_path, file_path))
    is_file_found = False
    for file_name in list_of_files:
        if entity_name in str(file_name.lower()):
            is_file_found = True
            flo = BytesIO()
            # Step 1: Read File Using SFTP as input Stream
            sftp.getfo('{}{}/{}'.format(base_path, file_path, file_name), flo)
            s3_destination_key = '{}/{}'.format(s3_path, file_name)
            # Step 2: Write files to desitination S3
            logger.info('Moving file to S3 {} '.format(s3_destination_key))
            # Creating a bucket resource to use bucket object for file upload
            input_bucket_object = S3.Bucket(environment_config['S3_INBOX_BUCKET'])
            flo.seek(0)
            upload_gzipped(input_bucket_object, s3_destination_key, flo)

It seems like the upload_gzipped function uses shutil.copyfileobj incorrectly.

Looking at https://docs.python.org/3/library/shutil.html#shutil.copyfileobj shows that you put the source first, and destination second.

Also, you're just writing your object to a gzipped object without ever actually compressing it.

You need to compress fp into a Gzip object, then upload that specific object to S3.

I'd recommend not using that gist from github as it seems wrong.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM