简体   繁体   中英

How to delete folder and its content in a AWS bucket using boto3

The documentation is a bit ambiguous when it comes to how to delete the content of a folder. If you take a look at how it's done for boto3, key isn't defined in boto3 antecedent sections, it's only defined in boto2 examples.

What's a flexible (more than a 1000 files) idiom that deletes the content of a folder?

I don't think you could delete 1000+ items in a single idiom in boto2 either. However, from boto3 perspective, you could try the following:

s3 = boto3.resource('s3')
bucket = s3.Bucket('bucket-name')
bucket.objects.filter(Prefix="path/to/dir").delete()

The above was tested and is working

>>> import boto3
>>> s3 = boto3.resource('s3')
>>> b = s3.Bucket('MY_BUCKET_NAME')
>>> b.objects.filter(Prefix="test/stuff")
s3.Bucket.objectsCollection(s3.Bucket(name='MY_BUCKET_NAME'), s3.ObjectSummary)
>>> list(b.objects.filter(Prefix="test/stuff"))
[s3.ObjectSummary(bucket_name='MY_BUCKET_NAME', key=u'test/stuff/new')]
>>> b.objects.filter(Prefix="test/stuff").delete()
[{u'Deleted': [{u'Key': 'test/stuff/new'}], 'ResponseMetadata': {'HTTPStatusCode': 200, 'RetryAttempts': 0, 'HostId': 'BASE64_ID_1', 'RequestId': 'REQ_ID', 'HTTPHeaders': {'x-amz-id-2': 'BASE64_ID_2', 'server': 'AmazonS3', 'transfer-encoding': 'chunked', 'connection': 'close', 'x-amz-request-id': 'REQ_ID', 'date': 'Fri, 12 May 2017 21:21:47 GMT', 'content-type': 'application/xml'}}}]
>>>

What's a flexible (more than a 1000 files) idiom that deletes the content of a folder?

There isn't one.

The primary resources in S3 are objects (identified by key) in buckets.

Folders are not resources, and not containers -- they are imaginary constructs created for convenience by the presence of / delimiters within the object key. (An "empty" folder such as can be created by the console is simply a zero-byte object whose key ends with / ).

As such, there is no idiom for "delete a folder and all of its contents. Even the capability of accomplishing this in the console is by sending delete or multi-object delete requests (limited to 1000) to the API.

A lifecycle policy can also be used to delete all objects with a given key prefix. This has a time granularity in days, and objects are removed within the specified number of days since they were created, +1/-0 days (they may persist for essentially up to 23:59:59 longer than the actual timing specified since policies are only evaluated once per day -- not in real time).

you can do it using aws cli : https://aws.amazon.com/cli/ and some unix command.

this aws cli commands should work:

aws s3 rm s3://<your_bucket_name> --recursive --exclude "*" --include "<your_regex>"

if you want to include sub-folders you should add the flag --recursive

or with unix commands:

aws s3 ls s3://<your_bucket_name>/ | awk '{print $4}' | xargs -I%  <your_os_shell>   -c 'aws s3 rm s3:// <your_bucket_name>  /% $1'

explanation:

  1. list all files on the bucket --pipe-->
  2. get the 4th parameter(its the file name) --pipe--> // you can replace it with linux command to match your pattern
  3. run delete script with aws cli

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM