简体   繁体   中英

How to decide a good partition key for Azure Cosmos DB

I'm new to Azure Cosmos DB, but I want to have a vivid understanding of:

  1. What is the partition key?

My understanding is shallow for now -> items with the same partition key will go to the same partition for storage, which could better load balancing when the system grows bigger.

  1. How to decide on a good partition key? Could somebody please provide an example?

Thanks a lot!

1.What is the partition key?

In azure cosmos db , there are two partitions: physical partition and logical partition

A.Physical partition is a fixed amount of reserved SSD-backed storage combined with variable amount of compute resources.

B.Logical partition is a partition within a physical partition that stores all the data associated with a single partition key value.

I think the partiton key you mentioned is the logical partition key.The partition key acts as a logical partition for your data and provides Azure Cosmos DB with a natural boundary for distributing data across physical partitions.More details, you could refer to How does partitioning work .

2.How to decide a good partition key? Could somebody please provide an example?

You need consider to pick a property name that has a wide range of values and has even access patterns.An ideal partition key is one that appears frequently as a filter in your queries and has sufficient cardinality to ensure your solution is scalable.

For example, your data has fields named id and color and you query the color as filter more frequently.You need to pick the color not id for partition key which is more efficient for your query performance. Because every item has different id but maybe has same color.It has wide range. Also if you add a color,the partition key is scalable.

More details ,please read the Partition and scale in Azure Cosmos DB .

Hope it helps you.

You have to choose your partition based on your workload. They can be classified into two.

  • Read Heavy
  • Write Heavy

Read heavy workloads are where the data is read more than it has been written, like the product catalog, where the insert/update frequency of the catalogs is less, and people browsing the product is more.

Write Heavy workloads are the ones where the data is written more than it is read. Common scenarios are IoT devices sending multiple data from multiple sensors. You will be writing lots of data to Cosmos DB because you may get data every second.

For read-heavy workload choose the partition key, where the property is used in the filter query . The product example will be the product id, which will be used mostly to fetch the data when the user wants to read the information and browse its reviews.

For Write-heavy workload choose the partition key, where the property is more unique . For example, in the IoT Scenario, use the partition key such as deviceid_signaldatetime, which is concatenating the device-id that sends the signal, and DateTime of the signal has more uniqueness.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM