简体   繁体   中英

Fastest way to delete files in Amazon S3

With boto3, one can delete files in a bucket as below

for object in bucket.objects.all():
    if 'xyz' in object.key:
        object.delete()

This sends one REST API call per file. If you have a large number of files, this can take a long time.

Is there a faster way to do this?

The easiest way to delete files is by using Amazon S3 Lifecycle Rules . Simply specify the prefix and an age (eg 1 day after creation) and S3 will delete the files for you!

However, this is not necessarily the fastest way to delete them -- it might take 24 hours until the rule is executed.

If you really want to delete the objects yourself, use delete_objects() instead of delete_object() . It can accept up to 1000 keys per call, which will be faster than deleting each object individually.

There are many ways to accomplish what you are asking.

Use Python's list comprehension, to get the list of objects that meet your criteria:

myobjects = [{'Key':obj.key} for obj in bucket.objects.all() if 'xyz' in obj.key]

Once you store the objects to be deleted in myobjects , call bulk delete :

bucket.delete_objects(Delete={ 'Objects': myobjects})

delete_objects(**kwargs)

This operation enables you to delete multiple objects from a bucket using a single HTTP request. You may specify up to 1000 keys.

If there are more than 1000 keys, then it is a matter looping through the list, slice 1000 keys in each iteration and call delete_objects()

Boto provides support for MultiDelete. Here's an example of how you would use it:

import boto.s3
conn = boto.s3.connect_to_region('us-east-1')  # or whatever region you want
bucket = conn.get_bucket('mybucket')
keys_to_delete = ['mykey1', 'mykey2', 'mykey3', 'mykey4']
result = bucket.delete_keys(keys_to_delete)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM