[英]Question about partitioning strategy with multi-tenant setup
I've been watching some videos and reading articles about azure cosmos db, and I think it will be a good fit for my app.我一直在观看一些关于 azure cosmos db 的视频和阅读文章,我认为它非常适合我的应用程序。 I am currently using SQL Server, but I think I will be reaching limits of SQL Server, and I am already seeing issues where queries from several large customers are causing throttling of queries by other smaller customers.
我目前正在使用 SQL 服务器,但我认为我将达到 SQL 服务器的限制,并且我已经看到来自几个大客户的查询导致其他较小客户的查询限制的问题。 I'm setting up a new database for my app, and I want to make sure I am setting it up correctly.
我正在为我的应用程序设置一个新数据库,并且我想确保我设置正确。
I want to find a good partition key strategy and container strategy, while maintaining a good separation between all customers.我想找到一个好的分区键策略和容器策略,同时保持所有客户之间的良好分离。 My knowledge in this area is limited, as I come from a SQL background (for the past 15 years or so).
我在这方面的知识有限,因为我来自 SQL 背景(过去 15 年左右)。 Here are some specifics for the app and the data it stores:
以下是应用程序及其存储的数据的一些细节:
Is it a good idea to use the account id for the partition key?将帐户 ID 用作分区键是个好主意吗? Or can I just use the account id for the container, and something else as the partition key?
或者我可以只使用容器的帐户 ID 和其他东西作为分区键吗? What is a good strategy in this situation?
在这种情况下有什么好的策略?
If it helps, this is an example of what the call records look like:如果有帮助,这是通话记录的示例:
{
"accountId": "9153849867",
"id": "I8uToEX1hjmwzUA",
"type": "Voice",
"direction": "Inbound",
"action": "Phone Call",
"result": "Accepted",
"callTime": "2020-05-26T16:58:14.675Z",
"duration": 235,
"callers": [
{
"phoneNumber": "7537547442",
"extensionNumber": null,
"location": "Edina, MN",
"name": "WIRELESS CALLER",
"toInd": false
"legInd": false
"extensionInd": false
},{
"phoneNumber": "2564572486",
"extensionNumber": null,
"location": null,
"name": null,
"toInd": true
"legInd": false
"extensionInd": false
}
],
"files": {
/*More data here that does not need to be searched*/
}
}
It is impossible to answer this with much precision because ultimately you will need to measure with load tests on both the write and query paths.不可能非常精确地回答这个问题,因为最终您将需要在写入和查询路径上进行负载测试。 I would think about the following.
我会考虑以下几点。
Is this a write heavy or read heavy application?这是一个写重还是读重的应用程序? If write heavy then optimize your partition strategy around distributing writes with a partition key with high cardinality.
如果写入繁重,则围绕使用具有高基数的分区键分布写入来优化您的分区策略。 If read heavy then optimize around single-partition reads/queries, or bounded partition reads/queries.
如果读取繁重,则围绕单分区读取/查询或有界分区读取/查询进行优化。 If it's both then look at using Change Feed with write optimized for one container and read optimized to serve queries.
如果两者兼而有之,请考虑使用针对一个容器进行写入优化和读取优化以服务查询的更改提要。 However you'll need to consider the cost of two containers and using change feed to copy data as part of whether this more efficient than serving queries from a single container.
但是,您需要考虑两个容器的成本并使用更改源来复制数据,作为这是否比从单个容器提供查询更有效的一部分。
Next consideration is storage.下一个考虑是存储。 For whatever partition key you choose (let's say account id) how fast do you get to the maximum partition size of 20 GB?
对于您选择的任何分区键(比如说帐户 ID),您能以多快的速度达到最大分区大小 20 GB? Do you TTL data at some point?
你在某个时候有 TTL 数据吗? Is it before you reach the 20 GB maximum.
是在达到最大 20 GB 之前。
Given the asymmetric load between accounts, account id may not be a good partition key as it will result in uneven distribution of writes.鉴于账户之间的非对称负载,账户 id 可能不是一个好的分区键,因为它会导致写入的不均匀分布。 Since throughput is distributed evenly, this is inefficient and likely won't perform well under heavy load.
由于吞吐量是均匀分布的,因此效率低下,并且在重负载下可能表现不佳。 It likely could result in larger accounts hitting the 20GB limit quickly.
这可能会导致更大的帐户迅速达到 20GB 的限制。 But for queries it's likely a good candidate as I'm sure customers query for their own data.
但对于查询,它可能是一个不错的选择,因为我确信客户会查询他们自己的数据。 You could consider using account id with a time element as a synthetic key.
您可以考虑使用带有时间元素的帐户 ID 作为合成键。 But for queries you'd need to time bound these so you aren't doing a fan out across all partitions.
但是对于查询,您需要对这些进行时间限制,这样您就不会在所有分区上进行扇出。
Not sure any of this is helpful.不确定这是否有帮助。 Partition strategy is difficult.
分区策略很难。 You have to deeply understand your access patterns to know what are the high volume writes and reads.
您必须深入了解您的访问模式才能知道什么是大容量写入和读取。 Measure those and the right key should emerge.
衡量这些,正确的关键应该出现。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.