I'm using Boto3 on AWS Lambda to process a datastream and publish the contents to a file in s3 for downstream processing. The data can be simple raw json in this case.
I would like to use the zlib
to store compressed gzip data to S3. In theory this is simple. However when I upload a gzip file using the following, my local machine says that the file is not in gzip format.
Can someone help explain what is going on here? This should be trivial. For what it is worth, when I read gzipped files that other programs produced, zlib.decompress
requires , 16+zlib.MAX_WBITS
as a wbits
argument in order to correctly read the compressed string. Perhaps I need the zlib.compress
equivalent?
import json
import zlib
import boto3
s3 = boto3.resource('s3')
def lambda_handler(event, context):
## Sample dataset
data = [{"var":1, "foo": "bar"}, {"var":2, "foo":"baz"}]
payload = '\n'.join([json.dumps(r) for r in data]).encode('utf-8')
## Upload
output = s3.Object("bucket", "file")
output.put(Body=zlib.compress(payload))
## Download and verify
obj = s3.Object("bucket", "file")
## Load the Streaming object body, decompress, decode
# , 16+zlib.MAX_WBITS
decompressed = zlib.decompress(obj.get()['Body'].read(), 16+zlib.MAX_WBITS).decode('utf-8').split("\n")
print(f"Decompressed payload: {payload}")
data2 = [json.loads(r) for r in decompressed]
return {
"statusCode": 200,
"TestVerification?": data2==data,
"body": json.dumps('Demo')
}
Later, download the file locally:
zcat testcompressed.gz
gzip: testcompressed.gz: not in gzip format
Yes, you'd need the zlib.compress
equivalent. However, there isn't one. You instead need to use zlib.compressobj
, which has a wbits
parameter.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.