简体   繁体   中英

Python way of setting a s3 subdirectory path and deleting it using boto

I have a path like below:

s3://edl-landing/lu/hello2/  

Under which I have two tables as shown below:

小路
Now, each table has parquet data in the below folders format:

格式

Now, I want to delete the data for 10th and 11th November . So, below is my python code using the boto client but it doesn't delete the objects from S3 . I don't get any errors. Also, I tried various solutions from this link but it also doesn't delete the actual data from S3.

prefix = "lu/hello2/"
s3 = boto3.resource('s3')
bucket = s3.Bucket(name="edl-landing")
FilesNotFound = True
blankList=[]
for obj in bucket.objects.filter(Prefix=prefix):
     #print(obj.key)
     blankList.append(obj.key.split('/')[2])
blankList = set(blankList) // 2 names
while ("" in blankList):
  blankList.remove("")
datesList = ['2020-11-10','2020-11-11']
for i in blankList:
  for j in datesList:
    path = "s3://edl-landing/lu/hello2/"+i+"/edl_load_ts="+j+"/"
    print(path)
    bucket.objects.filter(Prefix=path).delete()
    print("All the objects have been deleted for the mentioned dates...")

Where am I going wrong? I am running it via an EC2 instance.

You are passing a complete S3 URI to bucket.objects.filter(Prefix=path) .

This function requires an S3 object prefix, not an S3 URI.

You should have been able to verify that your code was attempting to delete zero objects because bucket.objects.filter(Prefix=path) returned zero objects.

Pass the relevant prefix: "lu/hello2/"+i+"/edl_load_ts="+j+"/"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM