简体   繁体   中英

Bulk copy with boto3

Boto3 has a managed copy method, which works pretty nicely for individual objects. Similarly, it seems to have a delete() method that works on a collection. But if I have a collection of objects (see objects below), it seems like the only way I can do a bulk operation is to use my own thread/process pool (I'm using multiprocessing for simplicity, but concurrent.futures would likely be better for error handling).

import boto3
import multiprocessing

bucket_name = '1000genomes'
prefix = 'changelog_details/'
bucket = boto3.resource('s3').Bucket(bucket_name)
objects = bucket.objects.filter(Prefix=prefix).limit(30)
sources = [{'Bucket': o.bucket_name, 'Key': o.key} for o in objects]

target_bucket = 'my-bucket'  # fill in bucket here!

def copy_to_bucket(src, bucket=target_bucket):
    # client is not thread-safe according to docs
    s3 = boto3.resource('s3')
    return s3.meta.client.copy(src, bucket, src['Key'])

pool = multiprocessing.Pool(20)
results = pool.map(copy_to_bucket, sources)
print('Copied %d results' % len(results))

Have I missed something about how to do this in bulk? I was attempting to look through the aws-cli codebase and/or the S3Transfer class, but both of them seem to be focused on uploading or downloading files.

Failing that, any thoughts on whether threads or processes are the better choice here? (I'd think the majority of a server-side copy would be just waiting for network I/O regardless).

I had to solve this problem a while ago and while I was preparing to work on it I wrote this DesignDoc .

Threads will be your best fried here cause this's an I/O problem. I wrote my implementation of concurrent copying in s3 on S3-migrator . Also I needed to keep a state of what files I'm copying and used mysql for that due to our usage.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM