简体   繁体   English

Azure cosmos DB分区键设计选择

[英]Azure cosmos DB partition key design selection

Selecting partition key is a simple but important design choice in Azure Cosmos DB.选择分区键是 Azure Cosmos DB 中一个简单但重要的设计选择。 In terms of improving performance and costs (RUs).在提高性能和成本 (RU) 方面。 Azure cosmos DB does not allow us to change partition key. Azure cosmos DB 不允许我们更改分区键。 So it is very important to select right partition key.所以select右分区键非常重要。

I gone through Microsoft documents Link我浏览了 Microsoft 文档链接

But I still have confusion to choose partition key但是我仍然对选择分区键感到困惑

Below is the item structure, I am planning to create下面是项目结构,我打算创建

{
   "id": "unique id like UUID", # just to keep some unique ID for item
   "file_location": "/videos/news/finance/category/sharemarket/it-sectors/semiconductors/nvidia.mp4", # This value some times contains special symbols like spaces, dollars, caps and many more 
   "createatedby": "andrew",
   "ts": "2022-01-10 16:07:25.773000",
   "directory_location": "/videos/news/finance/category/sharemarket/it-sectors/semiconductors/", 
   "metadata": [
      {
        "codec": "apple",
        "date_created": "2020-07-23 05:42:37",
        "date_modified": "2020-07-23 05:42:37",
        "format": "mp4",
        "internet_media_type": "video/mp4",
        "size": "1286011"
      }
    ],
   "version_id": "48ad8200-7231-11ec-abda-34519746721"
}

I am using Azure cosmos SQL API.我正在使用 Azure 宇宙 SQL API。 By Default, Azure cosmos take cares of indexing all data.默认情况下,Azure cosmos 负责索引所有数据。 In above case all properties are indexed.在上述情况下,所有属性都被索引。

for reading items I use file_location property.对于阅读项目,我使用 file_location 属性。 Can I make file_location as primary key?我可以将 file_location 作为主键吗? or anything else to consider.或任何其他需要考虑的事情。

Fews notes:几点注意事项:

file_location values contains special characters like spaces, commas, dollars and many more. file_location 值包含特殊字符,如空格、逗号、美元等等。

Few containers contains 150 millions entries and few containers contains just 20 millions.很少有容器包含 1.5 亿个条目,很少有容器只包含 2000 万个条目。

my operations are我的操作是

more reads, frequent writes as new videos are added, less updates in case videos changed.随着新视频的添加,更多的读取,频繁的写入,更少的更新,以防视频发生变化。

Few things to keep in mind while selecting partition keys:选择分区键时要记住的几件事:

  • Observe the query parameters while reading data, they give you good hints to what partition key candidates are.在读取数据时观察查询参数,它们可以很好地提示您候选分区键是什么。
  • You mentioned that few containers contain 150 million documents and few containers contain 20 million documents.您提到很少有容器包含 1.5 亿个文档,很少有容器包含 2000 万个文档。 Instead of number of documents stored in a container what matters is which containers are getting higher number of requests.重要的是哪些容器收到了更多的请求,而不是存储在容器中的文档数量。 If few containers are getting too many requests, that is a good indicator of poorly designed partition keys.如果少数容器收到太多请求,则表明分区键设计不佳。
  • Try to distribute the request load as evenly as possible among containers so that it gets distributed evenly among the physical partitions.尝试在容器之间尽可能均匀地分配请求负载,以便在物理分区之间均匀分布。 Otherwise, you will get hot-partition issues and will workaround by increasing throughput which will cost you more $.否则,您将遇到热分区问题,并且将通过增加吞吐量来解决问题,这将花费您更多的美元。
  • Try to limit cross-partition queries as much as possible尽量限制跨分区查询

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM