简体   繁体   中英

How to copy individual objects between S3 buckets quickly / in a fast manner?

We have a lambda that is triggered for some S3 files and is supposed to copy them to a different bucket. Basically the code looks like

import boto3

def handler(event, context):
    boto3.client("s3").copy_object(Bucket="target-bucket", Key="5_gb.data", CopySource={"Bucket": "source-bucket", "Key": "5_gb.data"})

Knowing that a CopyObject operation does not actually download the object into the lambda and uploads it again but instead the copy is handled entirely by S3 I would expect this to be done pretty quickly. But the lambda (configured with eg 1024MB RAM) times out after 15 minutes and the object does not appear in the target bucket.

If I copy the object via

aws s3 cp s3://source-bucket/5_gb.data s3://target-bucket/5_gb.data

the copy finishes after roughly 2.5 minutes.

Why is the python code so much slower than the aws cli call?

The copy operation is handled by S3 internally but is quite slow for larger files.

The cli itself uses boto under the hood as well but it uses a different method of copying files, it uses the multipart upload / copy operation and copies parts of the file in parallel (at least as long as the file is large enough) thereby achieving far higher copy performance.
If you inspect the aws-cli code you can see that it uses the TransferManager from boto, you can do exactly the same and rewrite your lambda to be:

import boto3
from s3transfer.manager import TransferManager, TransferConfig

def handler(event, context):
    manager = TransferManager(boto3.client("s3"), TransferConfig(max_request_concurrency=20))
    manager.copy(bucket="target-bucket", key="5_gb.data", copy_source={"Bucket": "source-bucket", "Key": "5_gb.data"}).result()

That will result in the lambda achieving copy speeds that are similar to the local cli invocation. In my testing it was sufficient to provision a lambda with 512MB RAM and it copied the file without getting close to the timeout.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM