简体   繁体   中英

python - gzip string and upload to s3

I have written a snippet to download file from s3 and modify some xml data, then upload it back into s3. The data is gzip so I unzip it first and then modify and gzip it back. I see the gzip returns some data (def not length 0) why does the upload does this?

    s3Key='test'
    try:
        bytes_buffer = io.BytesIO()
        s3.download_fileobj(Bucket=bucketName, Key=s3Key, Fileobj=bytes_buffer)
        byte_value = BytesIO(bytes_buffer.getvalue())
        gzipfile = GzipFile(fileobj=byte_value)
        content = gzipfile.read()
        xml = et.fromstring(content)
        for specialrequest in xml.xpath("(//*[local-name()='{}'])".format(nodeName)):
            # perform regex
            value = specialrequest.text
            value = 'test_replacement'
            specialrequest.text = value
        xml = et.tostring(xml)
        byte_value = StringIO()
        with GzipFile(fileobj=byte_value, mode="w") as f:
            f.write(xml)
        #s3.upload_fileobj(io.BytesIO(byte_value), bucketName, s3Key)
        response = s3.put_object(Body=byte_value.getvalue(), Bucket=bucketName, Key=s3Key)
        print(response)
    #print(byte_value.getvalue())
    except Exception:
        print "Unexpected error:", sys.exc_info()[0]
        pass

The put is successful but the content length always result in 0

{u'VersionId': 'mHZJAS6b2ordFx802D4egd56VFZjACOI', u'ETag': '"5d8fa27c1e14fee5d12c6856cc0c2074"', 'ResponseMetadata': {'HTTPStatusCode': 200, 'RetryAttempts': 0, 'HostId': 'Ig2nK1VtgURwGIHXXF8cgYqoUPrY/jW3ilhI8so9E9T0AKUn5Q3FX0IfrDsHanxqXS/4kO9Dje4=', 'RequestId': '1PY7DFWE37CACEM9', 'HTTPHeaders': {'content-length': '0', 'x-amz-id-2': 'Ig2nK1VtgURwGIHXXF8cgYqoUPrY/jW3ilhI8so9E9T0AKUn5Q3FX0IfrDsHanxqXS/4kO9Dje4=', 'server': 'AmazonS3', 'x-amz-request-id': '1PY7DFWE37CACEM9', 'etag': '"5d8fa27c1e14fee5d12c6856cc0c2074"', 'date': 'Tue, 22 Jun 2021 02:34:48 GMT', 'x-amz-version-id': 'mHZJAS6b2ordFx802D4egd56VFZjACOI'}}}

EDIT:

After using zlib to compress instead - I was able to upload the file with the expected file size (same as the gzip downloaded), however, when trying to unzip it locally to validate the data, it keeps turning it into cpgz for some reason

xml = et.tostring(xml)
compressed = zlib.compress(str.encode(xml))
response = s3.put_object(Body=compressed, Bucket=bucketName, Key=s3Key)

Try this while assuming that the xml is same as the root object:

import xml.etree.ElementTree as ET
import boto3

xml_string = ET.tostring(root, encoding='utf=8').decoding('utf8')
print(xml_string) # Optional

xml_byte = bytes(xml_string,'utf8')# gzip compress take bytes and not string

gzip_compressed = gzip.compress(xml_byte)

s3 = boto3.client('s3')
response = s3.put_object(Body=gzip_compressed, Bucket=bucketName, Key=s3Key)

if response:
      print("file uploaded successfully") 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM