简体   繁体   English

AWS CLI S3API 在路径中查找最新文件夹

[英]AWS CLI S3API find newest folder in path

I've got a very large bucket (hundreds of thousands of objects).我有一个非常大的存储桶(数十万个对象)。 I've got a path (lets say s3://myBucket/path1/path2).我有一条路径(可以说 s3://myBucket/path1/path2)。 /path2 gets uploads that are also folders. /path2 获取也是文件夹的上传。 So a sample might look like:因此,示例可能如下所示:

s3://myBucket/path1/path2/v6.1.0
s3://myBucket/path1/path2/v6.1.1
s3://myBucket/path1/path2/v6.1.102
s3://myBucket/path1/path2/v6.1.2
s3://myBucket/path1/path2/v6.1.25
s3://myBucket/path1/path2/v6.1.99

S3 doesn't take into account version number sorting (which makes sense) but alphabetically the last in the list is not the last uploaded. S3 不考虑版本号排序(这是有道理的),但按字母顺序排列的列表中的最后一个不是最后一个上传的。 In that example .../v6.1.102 is the newest.在那个例子中 .../v6.1.102 是最新的。

Here's what I've got so far:这是我到目前为止所得到的:

aws s3api list-objects 
--bucket myBucket
--query "sort_by(Contents[?contains(Key, \`path1/path2\`)],&LastModified)"´ 
--max-items 20000

So one problem here is max-items seems to start alphabetically from the all files recursively in the bucket.所以这里的一个问题是 max-items 似乎从存储桶中的所有文件递归地按字母顺序开始。 20000 does get to my files but it's a pretty slow process to go through that many files. 20000 确实访问了我的文件,但处理这么多文件是一个非常缓慢的过程。

So my questions are twofold:所以我的问题是双重的:

1 - This is still searching the whole bucket but I just want to narrow it down to path2/ . 1 - 这仍在搜索整个存储桶,但我只想将其缩小到 path2/ 。 Can I do this?我可以这样做吗?

2 - This lists just objects, is it possible to pull up just a path list instead? 2 - 这仅列出对象,是否可以仅拉出路径列表?

Basically the end goal is I just want a command to return the newest folder name like 'v6.1.102' from the example above.基本上最终目标是我只想要一个命令来返回上面示例中的最新文件夹名称,如“v6.1.102”。

To answer #1, you could add the --prefix path1/path2 to limit what you're querying in the bucket.要回答 #1,您可以添加--prefix path1/path2来限制您在存储桶中查询的内容。

In terms of sorting by last modified, I can only think of using an SDK to combine the list_objects_v2 and head_object (boto3) to get last modified on the objects and programmatically sort在按最后修改排序方面,我只能想到使用SDK将list_objects_v2head_object (boto3)结合起来,以获取对象的最后修改并以编程方式排序

Update更新

Alternatively, you could reverse sort by LastModified in jmespath and return the first item to give you the most recent object and gather the directory from there.或者,您可以在jmespath 中通过LastModified反向排序并返回第一个项目,为您提供最新的对象并从那里收集目录。

aws s3api list-objects-v2 \
--bucket myBucket \
--prefix path1/path2 \
--query 'reverse(sort_by(Contents,&LastModified))[0]'

If you want general purpose querying eg "lowest version", "highest version", "all v6.x versions" then consider maintaining a separate database with the version numbers.如果您想进行通用查询,例如“最低版本”、“最高版本”、“所有 v6.x 版本”,那么请考虑使用版本号维护一个单独的数据库。

If you only need to know the highest version number and you need that to be retrieved quickly (quicker than a list object call) then you could maintain that version number independently.如果您只需要知道最高版本号并且需要快速检索它(比列表对象调用更快),那么您可以独立维护该版本号。 For example, you could use a Lambda function that responds to objects being uploaded to path1/path2 where the Lambda function is responsible for storing the highest version number that it has seen into a file at s3://mybucket/version.max.例如,您可以使用 Lambda 函数响应上传到 path1/path2 的对象,其中 Lambda 函数负责将它看到的最高版本号存储到位于 s3://mybucket/version.max 的文件中。

Prefix works with list_object using boto3 client. Prefix 使用 boto3 客户端与 list_object 一起使用。 But using boto3 resource might give some issues.但是使用 boto3 资源可能会出现一些问题。 Paginator in pagination is a great concept and works nice!.分页中的分页器是一个很棒的概念并且效果很好! to find the latest changes(additions of objects) : sort_by(contents)[-1]查找最新更改(添加对象):sort_by(contents)[-1]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM