简体繁体 English

使用前缀和分隔符列出 S3 存储桶的性能

[英]Performance of listing S3 bucket with prefix and delimiter

原文 2016-08-13 08:33:12 9 3 amazon-web-services/ amazon-s3

According to the listing documentation it is possible to treat a large navigate number of keys as though they were hierarchial.根据列表文档，可以将大量导航键视为分层的。 I am planning to store a large number of keys (let's say a few hundred million), distributed over a sensible-sized 'hierarchy'.我计划存储大量密钥（比如几亿个），分布在合理大小的“层次结构”上。

What is the performance of using a prefix and delimiter?使用前缀和分隔符的性能如何？ Does it require a full enumeration of keys at the S3 end, and therefore be an O(n) operation?它是否需要在 S3 端完整枚举键，因此是 O(n) 操作？ I have no idea whether keys are stored in a big hash table, or whether they have indexing data structures, or if they're stored in a tree or what.我不知道键是否存储在大哈希表中，或者它们是否具有索引数据结构，或者它们是否存储在树中或什么。

I want to avoid the situation where I have a very large number of keys and navigating the 'hierarchy' suddenly becomes difficult.我想避免我有大量键并且在“层次结构”中导航突然变得困难的情况。

So if I have the following keys:所以如果我有以下键：

abc/def/ghi/0
abc/def/ghi/1
abc/def/ghi/...
abc/def/ghi/100,000,000,000

Will it affect the speed of the query Delimiter='/, Prefix='abc/def' ?它会影响查询Delimiter='/, Prefix='abc/def'吗？

3 个解决方案

Aside from the Request Rate and Performance Considerations document that Sandeep referenced (which is not applicable to your use case), AWS hasn't publicized very much regarding S3 performance.除了 Sandeep 引用的请求率和性能注意事项文档（不适用于您的用例）之外，AWS 还没有过多地宣传 S3 性能。 It's probably private intellectual property.这可能是私有知识产权。 So I doubt you'll find very much information unless you can get it somehow from AWS directly.所以我怀疑你会找到很多信息，除非你能以某种方式直接从 AWS 获得它。

However, some things to keep in mind:但是，请记住以下几点：

Amazon S3 is built for massive scale. Amazon S3 专为大规模构建。 Millions of companies are using S3 with millions of keys in millions of buckets.数以百万计的公司正在使用 S3 和数百万个存储桶中的数百万个密钥。
AWS promotes the prefix + delimiter as a very valid use case. AWS 将前缀 + 分隔符提升为一个非常有效的用例。
There are common data structures and algorithms used in computer science that AWS is probably using behind the scenes to efficiently retrieve keys. AWS 可能在幕后使用计算机科学中使用的常见数据结构和算法来有效地检索密钥。 One such data structure is called a Trie or Prefix Tree.一种这样的数据结构称为特里树或前缀树。

Based on all of the above, chances are that it's much better than an order O(n) algorithm when you retrieve listing of keys.基于上述所有内容，当您检索键列表时，它可能比 O(n) 阶算法要好得多。 I think you are safe to use prefixes and delimiters for your hierarchy.我认为您可以安全地为层次结构使用前缀和分隔符。

As long as you are not using a continuous sequence (such as date 2016-13-08, 2016-13-09 and so on) in the prefix you shouldn't face any problem.只要您没有在前缀中使用连续序列（例如日期 2016-13-08、2016-13-09 等），您就不会遇到任何问题。 If your keys are auto-generated as a continuous sequence then prepend a randomly generated hash key to the keys (aidk-2016-13-08, ujlk-2016-13-09).如果您的密钥是作为连续序列自动生成的，则在密钥之前添加一个随机生成的哈希密钥 (aidk-2016-13-08, ujlk-2016-13-09)。 The amazon documentation says:亚马逊文档说：

Amazon S3 maintains an index of object key names in each AWS region. Amazon S3 维护每个 AWS 区域中对象键名称的索引。 Object keys are stored in UTF-8 binary ordering across multiple partitions in the index.对象键以 UTF-8 二进制顺序存储在索引中的多个分区中。 The key name dictates which partition the key is stored in. Using a sequential prefix, such as timestamp or an alphabetical sequence, increases the likelihood that Amazon S3 will target a specific partition for a large number of your keys, overwhelming the I/O capacity of the partition.键名称决定了键存储在哪个分区中。使用顺序前缀（例如时间戳或字母顺序）会增加 Amazon S3 将大量键定位到特定分区的可能性，从而使 I/O 容量不堪重负的分区。 If you introduce some randomness in your key name prefixes, the key names, and therefore the I/O load, will be distributed across more than one partition.如果在键名前缀中引入一些随机性，键名以及 I/O 负载将分布在多个分区中。

http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-thinkations.html

Amazon indicates that prefix naming strategies, such as randomized hashing, no longer influence S3 lookup performance. Amazon 表示前缀命名策略（例如随机散列）不再影响 S3 查找性能。

https://docs.aws.amazon.com/AmazonS3/latest/dev/optimizing-performance.html https://docs.aws.amazon.com/AmazonS3/latest/dev/optimizing-performance.html