简体   繁体   中英

Read a tar.gz compressed file form a file stream, uncompress it and put it in another file stream without writing to disk

I have a big 1GB directory with multiple files that is stored in S3 in the format tar.gz and manipulated by a Lambda function.

The file system of the Lambda function is read only. So I would like the operations to be done in memory.

I cannot include it in the image of the Lambda function itself, as GitHub won't accept such large files.

Having Lambda to read it from S3 seems to be reasonable but I can't get how to uncompress it. Sorry, I'm a beginner.

Here is what I wrote:

# Define the resources to use
s3 = boto3.resource('s3', region_name='us-east-1')
bucket = s3.Bucket('tensorflow-models')
object = bucket.Object('saved-model.tar.gz')

# Prepare 2 file streams
file_stream1 = io.BytesIO()
file_stream2 = io.BytesIO()

# Download object to file stream
object.download_fileobj(file_stream1)

# Uncompress it
with tarfile.open(file_stream1, "r:gz") as tar:
    tar.extractall(file_stream2)

# Use it in Tensorflow
model = tf.keras.models.load_model(file_stream2)

# Get the result
result = model.call(embedded_sentences)

Here is the error message:

{
  "errorMessage": "expected str, bytes or os.PathLike object, not BytesIO",
  "errorType": "TypeError",
  "requestId": "xxxxxxxxxxxxxxxxxxx",
  "stackTrace": [
    "  File \"/var/task/app.py\", line 87, in lambda_handler\n    with tarfile.open(file_stream1, \"r:gz\") as tar:\n",
    "  File \"/var/lang/lib/python3.9/tarfile.py\", line 1629, in open\n    return func(name, filemode, fileobj, **kwargs)\n",
    "  File \"/var/lang/lib/python3.9/tarfile.py\", line 1675, in gzopen\n    fileobj = GzipFile(name, mode + \"b\", compresslevel, fileobj)\n",
    "  File \"/var/lang/lib/python3.9/gzip.py\", line 173, in __init__\n    fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')\n"
  ]
}

I don't think you can work with 1Gb file from your lambda, because there is a 512 MB in its temp directory limitation (please check https://docs.aws.amazon.com/lambda/latest/dg/gettingstarted-limits.html )

Please check for large file, mount an EFS to the lambda or change to logic (explore another possibility worker instead of lambda)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM