简体   繁体   English

Azure Cosmos db的吞吐量值

[英]Throughput value for Azure Cosmos db

I am confused about how partition affects the size limit and throughput value for Azure Cosmos DB (in our case, we are using documentdb). 我对分区如何影响Azure Cosmos DB的大小限制和吞吐量值感到困惑(在我们的示例中,我们使用的是documentdb)。 If I understand the documentation correctly. 如果我正确理解文档

  1. for a partitioned collection, the 10G storage limit applies to each partition? 对于分区集合,每个分区都适用10G的存储限制?

  2. the throughput value ex. 吞吐量值ex。 400RU/S applies to each partition, not collection? 400RU / S适用于每个分区,不是集合?

  1. Whether you use Single-partition collections or multi partition collections, each partition can be up to 10 Gb. 无论您使用单分区集合还是多分区集合,每个分区的最大容量为10 Gb。 This means that a single-partition collection can not exceed that size, where a multi partition collection can . 这意味着单分区集合不能超过该大小,而多分区集合可以超过该大小。

Taken from Azure Cosmos DB FAQ : 摘自Azure Cosmos DB常见问题解答

What is a collection? 什么是集合?

A collection is a group of documents and their associated JavaScript application logic. 集合是一组文档及其关联的JavaScript应用程序逻辑。 A collection is a billable entity, where the cost is determined by the throughput and used storage. 集合是可计费的实体,其成本由吞吐量和已用存储量确定。 Collections can span one or more partitions or servers and can scale to handle practically unlimited volumes of storage or throughput. 集合可以跨越一个或多个分区或服务器,并且可以扩展以处理几乎无限量的存储或吞吐量。
Collections are also the billing entities for Azure Cosmos DB. 集合也是Azure Cosmos DB的计费实体。 Each collection is billed hourly, based on the provisioned throughput and used storage space. 根据预配置的吞吐量和使用的存储空间,每个收集按小时计费。 For more information, see Azure Cosmos DB Pricing. 有关更多信息,请参见Azure Cosmos DB定价。

  1. Billing is per Collection , where one collection can have one or more Partitions . 集合收取费用,其中一个集合可以具有一个或多个分区 Since Azure allocates partitions to host your collection, the amount of RU's needs to be per collection. 由于Azure分配分区来托管您的集合,因此RU的数量需要每个集合。 Otherwise a customer with lots and lots of partitions would get way more RU's than a different customer who has an equal collection, but way less partitions. 否则,拥有很多分区的客户将比拥有相同集合但拥有更少分区的客户获得更多的RU。

For more info, see the bold text in the quote below: 有关更多信息,请参见下面引号中的粗体:

Taken from Azure Cosmos DB Pricing : 摘自Azure Cosmos DB定价

Provisioned throughput 预配吞吐量

At any scale, you can store data and provision throughput capacity. 您可以以任何规模存储数据和设置吞吐量。 Each container is billed hourly based on the amount of data stored (in GBs) and throughput reserved in units of 100 RUs/second, with a minimum of 400 RUs/second. 根据存储的数据量(以GB为单位)和保留的吞吐量(以100 RUs /秒为单位,最低为400 RUs /秒), 每个小时按小时计费 Unlimited containers have a minimum of 100 RUs/second per partition. 无限的容器每个分区至少具有100 RU /秒的速度。

Taken from Request Units in Azure Cosmos DB : 取自Azure Cosmos DB中的请求单元

When starting a new collection, table or graph, you specify the number of request units per second (RU per second) you want reserved. 启动新的集合,表格或图形时,您可以指定要保留的每秒请求单位数(RU每秒)。 Based on the provisioned throughput, Azure Cosmos DB allocates physical partitions to host your collection and splits/rebalances data across partitions as it grows. 根据预配置的吞吐量,Azure Cosmos DB分配物理分区来托管您的集合,并随着数据的增长在各个分区之间拆分/重新平衡数据。

The other answers here provide a great starting point on throughput provisioning but failed to touch on an important point that doesn't get mentioned often in the docs. 此处的其他答案为吞吐量配置提供了一个很好的起点,但未能提及文档中很少提到的重要点。

Your throughput is actually divided across the number of physical partitions in your collection. 您的吞吐量实际上是在集合中的物理分区数之间分配的。 So for a multi partition collection provisioned for 1000RU/s with 10 physical partitions it's actually 100RU/s per partition. 因此,对于以10个物理分区配置为1000RU / s的多分区集合,实际上每个分区为100RU / s。 So if you have hot partitions that get accessed more frequently you'll receive throttling errors even though you haven't exceeded the total RU assigned the collection. 因此,如果您具有更频繁访问的热分区,即使您未超出分配给集合的总RU,也会收到节流错误。

For a single partition collection you obviously get the full RU assigned for that partition since it's the only one. 对于单个分区集合,您显然会获得分配给该分区的完整RU,因为它是唯一的分区。

If you're using a multi-partition collection you should strive to pick a partition key that has an even access pattern so that your workload can be evenly distributed across the underlying partitions without bottle necking. 如果您使用的是多分区集合,则应努力选择具有均匀访问模式的分区键,以便您的工作负载可以均匀分布在基础分区上而不会造成瓶颈。

  1. for a partitioned collection, the 10G storage limit applies to each partition? 对于分区集合,每个分区都适用10G的存储限制?

That is correct. 那是对的。 Each partition in a partitioned collection can be a maximum of 10GB in size. 分区集合中的每个分区的最大大小为10GB。

  1. the throughout value ex. 整体价值 400RU/S applies to each partition, not collection? 400RU / S适用于每个分区,不是集合?

The throughput is at collection level and not at partition level. 吞吐量是在收集级别而不是分区级别。 Further minimum RU/S for a partitioned collection is 2500 RU/S and not 400RU/S. 分区集合的进一步最小RU / S为2500 RU / S,而不是400RU / S。 400RU/S is the default for a non-partitioned collection. 非分区集合的默认值为400RU / S。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM