简体   繁体   English

通过id检索文档在cosmos db中的分区之间很慢

[英]retrieving a document by id is slow across partitions in cosmos db

I have a scenario where I need to retrieve a single document based on its id property from azure cosmos db. 我有一个场景,我需要根据azure cosmos db的id属性检索单个文档。 The only problem is I don't know the partition key and thus cannot use the document URI to access it. 唯一的问题是我不知道分区键,因此无法使用文档URI来访问它。

From my understanding writing a simple query like 从我的理解写出一个简单的查询

SELECT * from c WHERE c.id = "id here"

should be the way to go but I'm experiencing severe performance issues with this query. 应该是要走的路,但我遇到了这个查询的严重性能问题。 Most queries take 30s to 60s to complete and seem to consume insane amounts of RU/s. 大多数查询需要30到60秒才能完成,并且似乎消耗了大量的RU / s。 When executing 10 concurrent queries the max RU/s per partition went as high as 30.000. 执行10个并发查询时,每个分区的最大RU / s高达30.000。 (10.00 per partition was provisioned) Resulting in throttling and even slower responses. (每个分区配置10.00)导致限制甚至更慢的响应。

The collection comprises 10 partitions with around 3 Mb per partition, so 30 Mb in total and around 1,00,000 documents. 该集合包含10个分区,每个分区大约3 Mb,总共30 Mb和大约1,00,000个文档。 My indexing policy looks like this: 我的索引策略如下所示:

{
    "indexingMode": "lazy",
    "automatic": true,
    "includedPaths": [
        {
            "path": "/*",
            "indexes": [
                {
                    "kind": "Range",
                    "dataType": "Number",
                    "precision": -1
                },
                {
                    "kind": "Hash",
                    "dataType": "String",
                    "precision": 3
                }
            ]
        }
    ],
    "excludedPaths": []
}

And the consistency is set to EVENTUAL since I don't really care about read/write order. 并且一致性设置为EVENTUAL因为我并不真正关心读/写顺序。 The collection is under some write pressure with about 30 writes per minute and there's a TTL of 1 year for each document, yet this doesn't seem to produce a measurable impact on the RU/s. 该集合受到一些写入压力,每分钟大约30次写入,每个文档的TTL为1年,但这似乎不会对RU产生可测量的影响。 I experience this sort of problem only when querying documents. 我只在查询文档时遇到这种问题。

Has anyone had similar problems and can offer a fix/mitigation? 有没有人有类似的问题,可以提供修复/缓解? Am I doing something wrong with my query or indexing policy? 我的查询或索引策略有问题吗? I don't know why my query is consuming that much resources. 我不知道为什么我的查询消耗了那么多资源。

I got similar problem even. 我甚至遇到了类似的问题。 My database is 16 GB with 2 partitions and has 10,000 RU per partition. 我的数据库是16 GB,有2个分区,每个分区有10,000 RU。

By gathering query metrics, I found that query by id could be doing table scan and not looking up from index. 通过收集查询指标,我发现query by id可能正在进行表扫描而不是从索引中查找。

Here is the metrics of query by id: 以下是按ID查询的指标:

SELECT * FROM c where c.id = 'id-here'
--Read 1 record in 1497.00 ms, 339173.109 RU
--QueryPreparationTime(ms): CompileTime = 2, LogicalBuildTime = 0, 
     PhysicalPlanBuildTime = 0, OptimizationTime = 0
--QueryEngineTime(ms): DocumentLoadTime = 1126, IndexLookupTime = 0, 
     RuntimeExecutionTimes = 356, WriteOutputTime = 0

Notice the time spent mostly in DocumentLoadTime and IndexLookupTime = 0 . 请注意主要在DocumentLoadTimeIndexLookupTime = 0花费的时间。

While query by indexed field is pretty fast. 虽然索引字段的查询速度非常快。

SELECT * FROM c WHERE c.indexedField = 'value'
--Read 4 records in 2.00 ms, 7.56 RU
--QueryPreparationTime(ms): CompileTime = 0, LogicalBuildTime = 0, 
       PhysicalPlanBuildTime = 0, OptimizationTime = 0
--QueryEngineTime(ms): DocumentLoadTime = 0, IndexLookupTime = 1, 
       RuntimeExecutionTimes = 0, WriteOutputTime = 0

Contrast to the query by id, this doesn't consumed DocumentLoadTime as the index was used, IndexLookupTime is 1 ms. 与id的查询相比,这不会消耗DocumentLoadTime作为索引使用, IndexLookupTime是1 ms。

The problem is id should be the primary key and should be indexed by default but it looks like it doesn't. 问题是id应该是主键,默认情况下应该编入索引,但看起来不是。 You couldn't even add custom indexing policy for it. 你甚至无法为它添加自定义索引策略。

I'm currently logged a ticket to Microsoft support and waiting for clarifications. 我目前正在记录微软支持的门票并等待澄清。

Update: 更新:

Microsoft support responded and they've resolved the issue. 微软支持得到了回复,他们已经解决了这个问题。 They've added IndexVersion 2 for the collection. 他们为该系列添加了IndexVersion 2 Unfortunately, it is not yet available from the portal and newly created accounts/collection are still not using the new version. 不幸的是,门户网站尚未提供它,新创建的帐户/集合仍未使用新版本。 You'll have to contact Microsoft Support to made changes to your accounts. 您必须与Microsoft支持部门联系以对您的帐户进行更改。

Here are the new results from a collection with index version 2 and there's a massive improvement. 以下是索引版本2的集合的新结果,并且有了很大的改进。

SELECT * FROM c where c.id = 'uniqueValue'
-- Index Version 1: Request Charge: 344,940.79 RUs
-- Index Version 2: Request Charge: 3.31 RUs

SELECT * FROM c WHERE c.indexedField = 'value' AND c.id = 'uniqueValue'
-- Index Version 1: Request Charge: 150,666.22 RUs 
-- Index Version 2: Request Charge: 5.65 RUs

My test DB about 300k record When i try to select with ID only like this 我的测试数据库大约300k记录当我尝试选择ID时,只有这样

SELECT * FROM c where c.id = 'xxx'

It take me alot of time and RU 我花了很多时间和RU

But when i try with partition key in that 但是当我尝试使用分区键时

SELECT * FROM c where c.id = 'xxx' AND c.partitionField = 'yyy'

It's very fast 它非常快

So I think you must recontruct your db, and thinking which field to make a partition 因此,我认为您必须重新构建数据库,并考虑使用哪个字段进行分区

The key to Cosmos is to Rethink the Partition Key . Cosmos的关键是重新思考分区键 I don't know what you are using, but make it very available. 我不知道你在使用什么,但让它非常有用。

Recently I have been adding a 'Table' property to all of my documents, but you could very easily use the Table name as the Partition Key! 最近我一直在为我的所有文档添加一个'Table'属性,但你可以很容易地使用Table名作为分区键! It's really almost like having a bunch SQL tables just kind of floating around in the pudding that is a CosmosDB collection. 这真的几乎就像在一个CosmosDB集合的布丁中有一堆SQL表。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM