简体   繁体   中英

Some questions about Cosmos DB Physical and Logical Partitions

I am trying to understand the relationship between Physical/Logical partitions and throughput availability in Azure Cosmos DB and have a few questions.

Reference documentation: https://docs.microsoft.com/en-us/azure/cosmos-db/partitioning-overview .

Based on the documentation, here's my understanding:

  1. Each physical partition can hold 50GB data whereas each logical one can hold 20GB.
  2. Total provisioned throughput is evenly distributed amongst all physical partitions.
  3. Each physical partition can have a maximum of 10000 RU/s.
  4. Cosmos DB engine automatically creates physical partitions as and when it is needed and moves the logical partitions accordingly.

Now my questions are:

  • What is the logic behind creation of additional physical partitions?

Is it based on the space occupied by logical partitions or based on the throughput consumed by all logical partitions in a physical partition or something else completely. For example,

  1. Will Cosmos DB engine automatically creates 2 physical partitions if I provision a throughput of 20000 RU/s (regardless of whether I use it or not)?
  2. Will Cosmos DB engine create a single physical partition to begin with (I have just created a container with no data inside it and the provisioned throughput is less than 10000 RU/s)?
  3. Will Cosmos DB engine automatically remove physical partitions in case the total provisioned throughput becomes less than 10000 RU/s and/or the total size of logical partitions fall below 50 GB.

Any insights into this will be highly appreciated.

UPDATE

Based on the comments, I have split the original question in 2 parts. 2nd part of the question can be found here: How is the throughput available for a physical partition split amongst its logical partition in Cosmos DB? .

Some answers.

  1. Cosmos will actually create 3 partitions if you provision a new container with 20K RU/s. However if you start with less, say 5K RU, then scale up it will create 1 partitions, then increase to 2 partitions. The reason for the difference is we try to reduce the initial number of partition splits as users tend to ingest data during initial provisioning, often accompanying an additional increase in throughput. To reduce the number of partition splits we provision a physical partition at approx 60% of 10K RU/s. However, we don't apply this 60% universally because it's wasteful. It's just an optimization we make during initial provisioning based upon observed user patterns. It's also one of many reasons why you should not care about physical partitions and instead focus on your logical partition key. The 60% here is an implementation detail and can change at any time.

  2. Yes.

  3. Not yet but is coming. No ETA.

Throughput is always equally distributed so yes, 18K spread across 3 partitions, each would get 6K RU/s.

Is it based on the space occupied by logical partitions or based on the throughput consumed by all logical partitions

The splitting to physical partitions happens based on throughput provisioned as well as storage consumed on a single partition. Examples of when Cosmos will create a new physical partition

  1. If you provision a 6000RU/s DB and ingest 60GB of data.
  2. You provision a 15000RU/s DB and ingest 10GB of data. You can think of a physical partition as a Computer which can max handle 50GB of storage and 10K RU/s. Anything more than this will cause a split. The DB throughput is split evenly among the physical partitions, not logical partitions.

From the documentation it seems the size or utilization of a logical partition does not really matter and I could have some logical partitions getting more requests than others but as long as I am not exceeding the available throughput of the physical partition, I should be fine. Is this correct?

This is kind of true. The logical partition size does matter, meaning it can't be more than 20GB. The utilization is also limited to 10K RU/s. We have no control on how the logical partitions are split into the physical partitions so there is no real way for you to know on which physical partition your logical partitions lie in. Similarly there is no means to ensure that you don't exceed the 10K throughput of a physical partition. This is why MS recommends that you choose your partition key so the utilization is balanced appropriately.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM