简体   繁体   English

python 脚本同时删除多个 s3 存储桶的内容并等待它被删除有什么问题?

[英]What is the issue with the python script to delete contents of multiple s3 buckets concurrently and wait till it gets deleted?

I am trying to create a python script to delete the contents of 6 s3 buckets simultaneously, wait till all the data gets deleted, and handle more than 1000 objects in a bucket.我正在尝试创建一个 python 脚本来同时删除 6 个 s3 存储桶的内容,等待所有数据被删除,并处理一个存储桶中的 1000 多个对象。 However, I am randomly encountering the error "KeyError: 'endpoint_resolver'".但是,我随机遇到错误“KeyError:'endpoint_resolver'”。 I have set the AWS configuration correctly as I can list the S3 buckets by running the AWS command.我已正确设置 AWS 配置,因为我可以通过运行 AWS 命令列出 S3 存储桶。 Can you help me resolve this issue?你能帮我解决这个问题吗?

The code I have written is as follows:我写的代码如下:

import boto3
import concurrent.futures

def delete_s3_bucket_contents(bucket_name):
    sess = boto3.session.Session()
    s3 = sess.client('s3')
    bucket = boto3.resource('s3').Bucket(bucket_name)
    objects_to_delete = [{'Key': obj.key} for obj in bucket.objects.all()]

    while objects_to_delete:
        response = s3.delete_objects(
            Bucket=bucket_name,
            Delete={
                'Objects': objects_to_delete[:1000],
                'Quiet': True
            }
        )
        objects_to_delete = objects_to_delete[1000:]

def delete_multiple_buckets(bucket_names, max_workers=6):
    with concurrent.futures.ThreadPoolExecutor(max_workers=max_workers) as executor:
        futures = [executor.submit(delete_s3_bucket_contents, bucket) for bucket in bucket_names]
        concurrent.futures.wait(futures)
        for future in concurrent.futures.as_completed(futures):
            future.result()

bucket_names = ["A","B","C","D","E","F"]
delete_multiple_buckets(bucket_names)

I also tried to delete data from the above 6 buckets simultaneously in bash.我也试过在bash同时删除上面6个桶的数据。

  parallel -j 6 "aws s3api list-objects --bucket {} --query '{Contents: [Contents[].{Key: Key}]}' --output json | jq -r '.Contents[].Key' | xargs -I {} -n 1000 aws s3api delete-objects --bucket {} --delete '{\"Objects\":[{\"Key\":\"{}\"}],\"Quiet\":true}' " ::: "${destination_buckets[@]}"

but it was throwing jq error但它抛出 jq 错误

jq: error (at <stdin>:189150): Cannot index array with string "Key"

I can run aws rm command but it very slow in deletion我可以运行 aws rm 命令,但删除速度很慢

Following the general example , you can try to create session and use it to make the client in your delete_multiple_buckets function and just pass the client to your worker.按照一般示例,您可以尝试创建 session 并使用它在您的delete_multiple_buckets function 中创建客户端,然后将客户端传递给您的工作人员。

import boto3.session
from concurrent.futures import ThreadPoolExecutor

def delete_s3_bucket_contents(client, bucket_name):
    # Put your thread-safe code here

def delete_multiple_buckets(bucket_names, max_workers=6):
    # Create a session and use it to make our client
    session = boto3.session.Session()
    s3_client = session.client('s3')

    with concurrent.futures.ThreadPoolExecutor(max_workers=max_workers) as executor:
        # just pass your client as an argument
        futures = [executor.submit(delete_s3_bucket_contents, s3_client, bucket) for bucket in bucket_names]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM