简体   繁体   中英

AWS CLI S3API find newest folder in path

I've got a very large bucket (hundreds of thousands of objects). I've got a path (lets say s3://myBucket/path1/path2). /path2 gets uploads that are also folders. So a sample might look like:

s3://myBucket/path1/path2/v6.1.0
s3://myBucket/path1/path2/v6.1.1
s3://myBucket/path1/path2/v6.1.102
s3://myBucket/path1/path2/v6.1.2
s3://myBucket/path1/path2/v6.1.25
s3://myBucket/path1/path2/v6.1.99

S3 doesn't take into account version number sorting (which makes sense) but alphabetically the last in the list is not the last uploaded. In that example .../v6.1.102 is the newest.

Here's what I've got so far:

aws s3api list-objects 
--bucket myBucket
--query "sort_by(Contents[?contains(Key, \`path1/path2\`)],&LastModified)"´ 
--max-items 20000

So one problem here is max-items seems to start alphabetically from the all files recursively in the bucket. 20000 does get to my files but it's a pretty slow process to go through that many files.

So my questions are twofold:

1 - This is still searching the whole bucket but I just want to narrow it down to path2/ . Can I do this?

2 - This lists just objects, is it possible to pull up just a path list instead?

Basically the end goal is I just want a command to return the newest folder name like 'v6.1.102' from the example above.

To answer #1, you could add the --prefix path1/path2 to limit what you're querying in the bucket.

In terms of sorting by last modified, I can only think of using an SDK to combine the list_objects_v2 and head_object (boto3) to get last modified on the objects and programmatically sort

Update

Alternatively, you could reverse sort by LastModified in jmespath and return the first item to give you the most recent object and gather the directory from there.

aws s3api list-objects-v2 \
--bucket myBucket \
--prefix path1/path2 \
--query 'reverse(sort_by(Contents,&LastModified))[0]'

If you want general purpose querying eg "lowest version", "highest version", "all v6.x versions" then consider maintaining a separate database with the version numbers.

If you only need to know the highest version number and you need that to be retrieved quickly (quicker than a list object call) then you could maintain that version number independently. For example, you could use a Lambda function that responds to objects being uploaded to path1/path2 where the Lambda function is responsible for storing the highest version number that it has seen into a file at s3://mybucket/version.max.

Prefix works with list_object using boto3 client. But using boto3 resource might give some issues. Paginator in pagination is a great concept and works nice!. to find the latest changes(additions of objects) : sort_by(contents)[-1]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM