关于多租户设置的分区策略问题

Question

I've been watching some videos and reading articles about azure cosmos db, and I think it will be a good fit for my app.我一直在观看一些关于 azure cosmos db 的视频和阅读文章，我认为它非常适合我的应用程序。 I am currently using SQL Server, but I think I will be reaching limits of SQL Server, and I am already seeing issues where queries from several large customers are causing throttling of queries by other smaller customers.我目前正在使用 SQL 服务器，但我认为我将达到 SQL 服务器的限制，并且我已经看到来自几个大客户的查询导致其他较小客户的查询限制的问题。 I'm setting up a new database for my app, and I want to make sure I am setting it up correctly.我正在为我的应用程序设置一个新数据库，并且我想确保我设置正确。

I want to find a good partition key strategy and container strategy, while maintaining a good separation between all customers.我想找到一个好的分区键策略和容器策略，同时保持所有客户之间的良好分离。 My knowledge in this area is limited, as I come from a SQL background (for the past 15 years or so).我在这方面的知识有限，因为我来自 SQL 背景（过去 15 年左右）。 Here are some specifics for the app and the data it stores:以下是应用程序及其存储的数据的一些细节：

My app saves phone call records and text/sms records for my customers.我的应用程序为我的客户保存电话记录和短信/短信记录。
Some customers are big, and the app saves around 10,000 call records per day for these customers.有些客户很大，该应用程序每天为这些客户保存约 10,000 条通话记录。
Some customers are small, and the app saves around 10 call records per day for these customers (and there are ranges in between for other customers).一些客户很小，该应用程序每天为这些客户保存大约 10 条通话记录（对于其他客户，两者之间有一定的范围）。
As of now, after running the app for around 5 years, there are around 42,000,000 call and text records total in the SQL Server database across all customers, and it grows daily.截至目前，在运行该应用程序大约 5 年后，所有客户的 SQL 服务器数据库中总共有大约 42,000,000 条通话和短信记录，并且每天都在增长。 The current storage size for the entire database is 80GB.整个数据库的当前存储大小为 80GB。
On average, the app adds around 65,000 new records per day, across all customers.平均而言，该应用程序每天为所有客户添加约 65,000 条新记录。
The app is much more write-heavy than read-heavy, with most queries using date range and free text searching of the phone numbers.该应用程序的写入量比读取量大得多，大多数查询使用日期范围和电话号码的自由文本搜索。
The app has great potential to grow, and we even have a possible single customer coming on that will double this volume (this customer alone will have around 65,000 calls per day, bringing the total to 130,000 call records per day).该应用程序具有巨大的增长潜力，我们甚至可能有一个客户加入，这将使这一数量翻一番（仅此客户每天将有大约 65,000 个呼叫，使总数达到每天 130,000 个呼叫记录）。
I need to maintain text searching capabilities that are currently in the app (free text searching of these fields: phone number, names, text message content)我需要维护当前在应用程序中的文本搜索功能（这些字段的免费文本搜索：电话号码、姓名、短信内容）

Is it a good idea to use the account id for the partition key?将帐户 ID 用作分区键是个好主意吗？ Or can I just use the account id for the container, and something else as the partition key?或者我可以只使用容器的帐户 ID 和其他东西作为分区键吗？ What is a good strategy in this situation?在这种情况下有什么好的策略？

If it helps, this is an example of what the call records look like:如果有帮助，这是通话记录的示例：

{
  "accountId": "9153849867",
  "id": "I8uToEX1hjmwzUA",
  "type": "Voice",
  "direction": "Inbound",
  "action": "Phone Call",
  "result": "Accepted",
  "callTime": "2020-05-26T16:58:14.675Z",
  "duration": 235,
  "callers": [
  {
    "phoneNumber": "7537547442",
    "extensionNumber": null,
    "location": "Edina, MN",
    "name": "WIRELESS CALLER",
    "toInd": false
    "legInd": false
    "extensionInd": false
  },{
    "phoneNumber": "2564572486",
    "extensionNumber": null,
    "location": null,
    "name": null,
    "toInd": true
    "legInd": false
    "extensionInd": false
      }
  ],
  "files": {
    /*More data here that does not need to be searched*/
  }
}

Answer 1

It is impossible to answer this with much precision because ultimately you will need to measure with load tests on both the write and query paths.不可能非常精确地回答这个问题，因为最终您将需要在写入和查询路径上进行负载测试。 I would think about the following.我会考虑以下几点。

Is this a write heavy or read heavy application?这是一个写重还是读重的应用程序？ If write heavy then optimize your partition strategy around distributing writes with a partition key with high cardinality.如果写入繁重，则围绕使用具有高基数的分区键分布写入来优化您的分区策略。 If read heavy then optimize around single-partition reads/queries, or bounded partition reads/queries.如果读取繁重，则围绕单分区读取/查询或有界分区读取/查询进行优化。 If it's both then look at using Change Feed with write optimized for one container and read optimized to serve queries.如果两者兼而有之，请考虑使用针对一个容器进行写入优化和读取优化以服务查询的更改提要。 However you'll need to consider the cost of two containers and using change feed to copy data as part of whether this more efficient than serving queries from a single container.但是，您需要考虑两个容器的成本并使用更改源来复制数据，作为这是否比从单个容器提供查询更有效的一部分。

Next consideration is storage.下一个考虑是存储。 For whatever partition key you choose (let's say account id) how fast do you get to the maximum partition size of 20 GB?对于您选择的任何分区键（比如说帐户 ID），您能以多快的速度达到最大分区大小 20 GB？ Do you TTL data at some point?你在某个时候有 TTL 数据吗？ Is it before you reach the 20 GB maximum.是在达到最大 20 GB 之前。

Given the asymmetric load between accounts, account id may not be a good partition key as it will result in uneven distribution of writes.鉴于账户之间的非对称负载，账户 id 可能不是一个好的分区键，因为它会导致写入的不均匀分布。 Since throughput is distributed evenly, this is inefficient and likely won't perform well under heavy load.由于吞吐量是均匀分布的，因此效率低下，并且在重负载下可能表现不佳。 It likely could result in larger accounts hitting the 20GB limit quickly.这可能会导致更大的帐户迅速达到 20GB 的限制。 But for queries it's likely a good candidate as I'm sure customers query for their own data.但对于查询，它可能是一个不错的选择，因为我确信客户会查询他们自己的数据。 You could consider using account id with a time element as a synthetic key.您可以考虑使用带有时间元素的帐户 ID 作为合成键。 But for queries you'd need to time bound these so you aren't doing a fan out across all partitions.但是对于查询，您需要对这些进行时间限制，这样您就不会在所有分区上进行扇出。

Not sure any of this is helpful.不确定这是否有帮助。 Partition strategy is difficult.分区策略很难。 You have to deeply understand your access patterns to know what are the high volume writes and reads.您必须深入了解您的访问模式才能知道什么是大容量写入和读取。 Measure those and the right key should emerge.衡量这些，正确的关键应该出现。

关于多租户设置的分区策略问题

问题描述

1 个解决方案

解决方案1
0 2020-05-31 18:02:54

关于多租户设置的分区策略问题

问题描述

1 个解决方案

解决方案1 0 2020-05-31 18:02:54

解决方案1
0 2020-05-31 18:02:54