简体   繁体   English

S3 ListObjectsV2 能否返回从最新到最旧排序的键?

[英]Can S3 ListObjectsV2 return the keys sorted newest to oldest?

I have AWS S3 buckets with hundreds of top-level prefixes (folders).我有带有数百个顶级前缀(文件夹)的 AWS S3 存储桶。 Each prefix contains somewhere between five thousand and a few million files in each prefix - most growing at rate of 10-100k per year.每个前缀包含大约五千到几百万个文件 - 大多数文件以每年 10-100k 的速度增长。 99% of the time, all I care about are the newest 1-2000 or so in each folder... 99% 的时间,我只关心每个文件夹中最新的 1-2000 左右...

Using ListObjectV2 returns me 1000 files and that is the max (setting "MaxKeys" to a higher value still truncates the list at 1000).使用ListObjectV2返回 1000 个文件,这是最大值(将“MaxKeys”设置为更高的值仍然会在 1000 处截断列表)。 This would be reasonably fine, however (per the documentation) it's returning me the file list in ascending alphabetical order (which, given my keys/filenames have the date in them effectively results in a oldest->newest sort)... which is considerably less useful than if it returned me the NEWEST files (or reverse-alphabetical).这会相当好,但是(根据文档)它按字母顺序升序返回文件列表(考虑到我的键/文件名中有日期,有效地导致最旧 - >最新排序)......这是与返回最新文件(或反向字母顺序)相比,它的用处要小得多。

One option is to do a continuation allowing me to pull the entire prefix, then use the tail end of the entire array of keys as needed... but that would be (most importantly) slow for large 'folders'.一种选择是继续允许我拉出整个前缀,然后根据需要使用整个键数组的尾端......但这对于大型“文件夹”来说(最重要的是)会很慢。 A prefix with 2 million files would require 2,000 separate API calls, just to get the newest few-hundred filenames.具有 200 万个文件的前缀将需要 2,000 个单独的 API 调用,只是为了获取最新的几百个文件名。 (not to mention the costs incurred by pulling the entire bucket list even though I'm only really interested in the newest 1-2000 files.) (更不用说提取整个遗愿清单所产生的成本,尽管我只真正对最新的 1-2000 文件感兴趣。)

Is there a way to have the ListObjectV2 call (or any other s3 call) give me the list of the newest (or reverse-alphabetical) files?有没有办法让 ListObjectV2 调用(或任何其他 s3 调用)给我最新(或反向字母顺序)文件的列表? New files come in every few minutes - and the most important file is THE most recent file, so doing an S3 Inventory doesn't seem like it would do the trick.新文件每隔几分钟就会出现一次——最重要的文件是最新的文件,所以做一个S3 清单似乎并不能解决问题。

(or, perhaps, a call that gives me filenames in a created-by date range...?) (或者,也许,一个给我创建日期范围内的文件名的电话......?)

Using javascript - but I'm sure every language has more-or-less the same features when it comes to trying to list objects from an S3 bucket.使用 javascript - 但我敢肯定,在尝试列出 S3 存储桶中的对象时,每种语言都或多或少具有相同的功能。

Edit: weird idea: If AWS doesn't offer a 'sort' option on a basic API call for one of it's most popular services... Would it make sense to document all the filenames/keys in a dynamo table and query that instead?编辑:奇怪的想法:如果 AWS 没有在基本的 API 调用上提供“排序”选项,以调用它最受欢迎的服务之一......将所有文件名/键记录在发电机表中并查询它是否有意义?

No. The ListObjectsV2() will always return up to 1000 objects alphabetically in the requested Prefix.不会ListObjectsV2()将始终按请求的前缀中的字母顺序返回最多 1000 个对象。

You could use Amazon S3 Inventory , which can provide a daily or weekly CSV file listing all objects.您可以使用Amazon S3 Inventory ,它可以提供每日或每周 CSV 文件,列出所有对象。

If you need real-time or fairly fast access to a list of all available objects, your other option would be to trigger an AWS Lambda function whenever objects are created/deleted.如果您需要实时或相当快地访问所有可用对象的列表,您的另一个选择是在创建/删除对象时触发 AWS Lambda function The Lambda function would store/update information in a database (eg DynamoDB) that can provide very fast access to the list of objects. Lambda function 将存储/更新数据库(例如 DynamoDB)中的信息,该数据库可以提供对对象列表的非常快速的访问。 You would need to code this solution.您需要对此解决方案进行编码。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 跳过 Amazon S3 ListObjectsV2 中的前“n”个键 - Skip first "n" keys in Amazon S3 ListObjectsV2 S3 存储桶的 ListObjectsV2 操作的访问被拒绝 - AccessDenied for ListObjectsV2 operation for S3 bucket php 页面中的 aws s3 listobjectsv2 - aws s3 listobjectsv2 in php page aws-sdk S3:使用 listObjectsV2 列出所有键的最佳方式 - aws-sdk S3: best way to list all keys with listObjectsV2 aws S3 ListObjectsV2 api 中的 start-after 是什么意思? - What does start-after in aws S3 ListObjectsV2 api mean? 当 MLFlow 尝试访问存储在 S3 上的人工制品时,获取“(InvalidToken)调用 ListObjectsV2 操作时” - Getting "(InvalidToken) when calling the ListObjectsV2 operation" when MLFlow is trying to access the artefacts stored on S3 listObjectsV2 使用 nodejs 列出嵌套“文件夹”中的所有对象 - listObjectsV2 to list all the objects from nested "folders" using nodejs 从 S3 中删除最旧的 object - delete oldest object from S3 使用批处理作业时“调用 ListObjectsV2 操作时发生错误 (AccessDenied):访问被拒绝” - "An error occurred (AccessDenied) when calling the ListObjectsV2 operation: Access Denied" when using batch jobs 如果我们想使用 S3 来托管 Python 包,我们如何告诉 pip 在哪里可以找到最新版本? - If we want use S3 to host Python packages, how can we tell pip where to find the newest version?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM