[英]Performance of listing S3 bucket with prefix and delimiter
According to the listing documentation it is possible to treat a large navigate number of keys as though they were hierarchial.根据列表文档,可以将大量导航键视为分层的。 I am planning to store a large number of keys (let's say a few hundred million), distributed over a sensible-sized 'hierarchy'.我计划存储大量密钥(比如几亿个),分布在合理大小的“层次结构”上。
What is the performance of using a prefix and delimiter?使用前缀和分隔符的性能如何? Does it require a full enumeration of keys at the S3 end, and therefore be an O(n) operation?它是否需要在 S3 端完整枚举键,因此是 O(n) 操作? I have no idea whether keys are stored in a big hash table, or whether they have indexing data structures, or if they're stored in a tree or what.我不知道键是否存储在大哈希表中,或者它们是否具有索引数据结构,或者它们是否存储在树中或什么。
I want to avoid the situation where I have a very large number of keys and navigating the 'hierarchy' suddenly becomes difficult.我想避免我有大量键并且在“层次结构”中导航突然变得困难的情况。
So if I have the following keys:所以如果我有以下键:
abc/def/ghi/0
abc/def/ghi/1
abc/def/ghi/...
abc/def/ghi/100,000,000,000
Will it affect the speed of the query Delimiter='/, Prefix='abc/def'
?它会影响查询Delimiter='/, Prefix='abc/def'
吗?
Aside from the Request Rate and Performance Considerations document that Sandeep referenced (which is not applicable to your use case), AWS hasn't publicized very much regarding S3 performance.除了 Sandeep 引用的请求率和性能注意事项文档(不适用于您的用例)之外,AWS 还没有过多地宣传 S3 性能。 It's probably private intellectual property.这可能是私有知识产权。 So I doubt you'll find very much information unless you can get it somehow from AWS directly.所以我怀疑你会找到很多信息,除非你能以某种方式直接从 AWS 获得它。
However, some things to keep in mind:但是,请记住以下几点:
Based on all of the above, chances are that it's much better than an order O(n) algorithm when you retrieve listing of keys.基于上述所有内容,当您检索键列表时,它可能比 O(n) 阶算法要好得多。 I think you are safe to use prefixes and delimiters for your hierarchy.我认为您可以安全地为层次结构使用前缀和分隔符。
As long as you are not using a continuous sequence (such as date 2016-13-08, 2016-13-09 and so on) in the prefix you shouldn't face any problem.只要您没有在前缀中使用连续序列(例如日期 2016-13-08、2016-13-09 等),您就不会遇到任何问题。 If your keys are auto-generated as a continuous sequence then prepend a randomly generated hash key to the keys (aidk-2016-13-08, ujlk-2016-13-09).如果您的密钥是作为连续序列自动生成的,则在密钥之前添加一个随机生成的哈希密钥 (aidk-2016-13-08, ujlk-2016-13-09)。 The amazon documentation says:亚马逊文档说:
Amazon S3 maintains an index of object key names in each AWS region. Amazon S3 维护每个 AWS 区域中对象键名称的索引。 Object keys are stored in UTF-8 binary ordering across multiple partitions in the index.对象键以 UTF-8 二进制顺序存储在索引中的多个分区中。 The key name dictates which partition the key is stored in. Using a sequential prefix, such as timestamp or an alphabetical sequence, increases the likelihood that Amazon S3 will target a specific partition for a large number of your keys, overwhelming the I/O capacity of the partition.键名称决定了键存储在哪个分区中。 使用顺序前缀(例如时间戳或字母顺序)会增加 Amazon S3 将大量键定位到特定分区的可能性,从而使 I/O 容量不堪重负的分区。 If you introduce some randomness in your key name prefixes, the key names, and therefore the I/O load, will be distributed across more than one partition.如果在键名前缀中引入一些随机性,键名以及 I/O 负载将分布在多个分区中。
http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-thinkations.html
Amazon indicates that prefix naming strategies, such as randomized hashing, no longer influence S3 lookup performance. Amazon 表示前缀命名策略(例如随机散列)不再影响 S3 查找性能。
https://docs.aws.amazon.com/AmazonS3/latest/dev/optimizing-performance.html https://docs.aws.amazon.com/AmazonS3/latest/dev/optimizing-performance.html
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.