简体   繁体   中英

Azure cosmos DB partition key design selection

Selecting partition key is a simple but important design choice in Azure Cosmos DB. In terms of improving performance and costs (RUs). Azure cosmos DB does not allow us to change partition key. So it is very important to select right partition key.

I gone through Microsoft documents Link

But I still have confusion to choose partition key

Below is the item structure, I am planning to create

{
   "id": "unique id like UUID", # just to keep some unique ID for item
   "file_location": "/videos/news/finance/category/sharemarket/it-sectors/semiconductors/nvidia.mp4", # This value some times contains special symbols like spaces, dollars, caps and many more 
   "createatedby": "andrew",
   "ts": "2022-01-10 16:07:25.773000",
   "directory_location": "/videos/news/finance/category/sharemarket/it-sectors/semiconductors/", 
   "metadata": [
      {
        "codec": "apple",
        "date_created": "2020-07-23 05:42:37",
        "date_modified": "2020-07-23 05:42:37",
        "format": "mp4",
        "internet_media_type": "video/mp4",
        "size": "1286011"
      }
    ],
   "version_id": "48ad8200-7231-11ec-abda-34519746721"
}

I am using Azure cosmos SQL API. By Default, Azure cosmos take cares of indexing all data. In above case all properties are indexed.

for reading items I use file_location property. Can I make file_location as primary key? or anything else to consider.

Fews notes:

file_location values contains special characters like spaces, commas, dollars and many more.

Few containers contains 150 millions entries and few containers contains just 20 millions.

my operations are

more reads, frequent writes as new videos are added, less updates in case videos changed.

Few things to keep in mind while selecting partition keys:

  • Observe the query parameters while reading data, they give you good hints to what partition key candidates are.
  • You mentioned that few containers contain 150 million documents and few containers contain 20 million documents. Instead of number of documents stored in a container what matters is which containers are getting higher number of requests. If few containers are getting too many requests, that is a good indicator of poorly designed partition keys.
  • Try to distribute the request load as evenly as possible among containers so that it gets distributed evenly among the physical partitions. Otherwise, you will get hot-partition issues and will workaround by increasing throughput which will cost you more $.
  • Try to limit cross-partition queries as much as possible

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM