i am trying to upload files to S3 before that i am trying to Gzip files, if you see the code below, the files uploaded to the S3 have no change in the size, so i am trying to figure out if i have missed something.
import gzip
import shutil
from io import BytesIO
def upload_gzipped(bucket, key, fp, compressed_fp=None, content_type='text/plain'):
"""Compress and upload the contents from fp to S3.
If compressed_fp is None, the compression is performed in memory.
"""
if not compressed_fp:
compressed_fp = BytesIO()
with gzip.GzipFile(fileobj=compressed_fp, mode='wb') as gz:
shutil.copyfileobj(fp, gz)
compressed_fp.seek(0)
bucket.upload_fileobj(
compressed_fp,
key,
{'ContentType': content_type, 'ContentEncoding': 'gzip'})
Courtesy Link for the source
And this is how i am using this fucntion, so basically reading files as stream from SFTP and then trying to Gzip them and then write them to S3.
with pysftp.Connection(host_name, username=user, password=password, cnopts=cnopts, port=int(port)) as sftp:
list_of_files = sftp.listdir('{}{}'.format(base_path, file_path))
is_file_found = False
for file_name in list_of_files:
if entity_name in str(file_name.lower()):
is_file_found = True
flo = BytesIO()
# Step 1: Read File Using SFTP as input Stream
sftp.getfo('{}{}/{}'.format(base_path, file_path, file_name), flo)
s3_destination_key = '{}/{}'.format(s3_path, file_name)
# Step 2: Write files to desitination S3
logger.info('Moving file to S3 {} '.format(s3_destination_key))
# Creating a bucket resource to use bucket object for file upload
input_bucket_object = S3.Bucket(environment_config['S3_INBOX_BUCKET'])
flo.seek(0)
upload_gzipped(input_bucket_object, s3_destination_key, flo)
It seems like the upload_gzipped
function uses shutil.copyfileobj
incorrectly.
Looking at https://docs.python.org/3/library/shutil.html#shutil.copyfileobj shows that you put the source first, and destination second.
Also, you're just writing your object to a gzipped object without ever actually compressing it.
You need to compress fp
into a Gzip object, then upload that specific object to S3.
I'd recommend not using that gist from github as it seems wrong.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.