简体   繁体   English

AWS DynamoDB 分区键设计

[英]AWS DynamoDB Partition Key Design

I read this answer , which clarified a lot of things, but I'm still confused about how I should go about designing my primary key.我读了这个答案,它澄清了很多事情,但我仍然对我应该如何设计我的主键感到困惑。

First off I want to clarify the idea of WCUs.首先,我想澄清 WCU 的概念。 I get that WCU is the write capacity of max 1kb per second.我知道 WCU 是每秒最大 1kb 的写入容量。 Does it mean that if writing a piece of data takes 0.25 seconds, I would need 4 of those to be billed 1 WCU?这是否意味着如果写入一条数据需要 0.25 秒,我需要其中的 4 个来计费 1 个 WCU? Or each time I write something it consumes 1 WCU, but I could also write X times within 1 second and still be billed 1 WCU?或者每次我写东西都会消耗 1 个 WCU,但我也可以在 1 秒内写 X 次,仍然需要 1 个 WCU?

Usage用法

I want to create a table that stores the form data for a set of gyms (95% will be waivers, the rest will be incidents reports).我想创建一个表来存储一组健身房的表单数据(95% 将是豁免,rest 将是事件报告)。 Most of the time, each forms will be accessed directly via its unique ID.大多数情况下,每个 forms 都将通过其唯一 ID 直接访问。 I also want to query the forms by date, form, userId, etc..我还想按日期、表单、userId 等查询 forms。

We can assume an average of 50k forms per gym我们可以假设每个健身房平均有 50k forms

Options选项

  • First option is straight forward: having the formId be the partition key.第一个选项很简单:让 formId 成为分区键。 What I don't like about this option is that scan operations will always filter out 90% of the data (ie the forms from other gyms), which isn't good for RCUs.我不喜欢这个选项的是扫描操作总是会过滤掉 90% 的数据(即来自其他健身房的 forms),这对 RCU 不利。

  • Second option is that I would make the gymId the partition key, and add a sort key for the date, formId, userId.第二个选项是我将gymId设为分区键,并为日期、formId、userId添加一个排序键。 To implement this option I would need to know more about the implications of having 50k records on one partition key.要实现此选项,我需要更多地了解在一个分区键上拥有 50k 条记录的含义。

  • Third option is to have one table per gyms and have the formId as partition key.第三种选择是每个健身房有一张桌子,并将 formId 作为分区键。 This seems to be like the best option for now, but I don't really like the idea of having aa large number of tables doing the same thing in my account.这似乎是目前最好的选择,但我不太喜欢让大量表在我的帐户中执行相同操作的想法。

Is there another option?还有其他选择吗? Which one of the three is better?这三个哪个更好?

Edit: I'm assuming another option would be SimpleDB ?编辑:我假设另一个选项是SimpleDB

For your PK design.为您的PK设计。 What data does the app have when a user is going to look for a form?当用户要查找表单时,应用程序有哪些数据? Does it have the GymID, userID, and formID?它有 GymID、userID 和 formID 吗? If so, make a compound key out of that for the PK perhaps?如果是这样,也许为PK制作一个复合键? So your PK might look like:所以你的 PK 可能看起来像:

234455::53894302::245 

Where 23445 is the GymID, 53894302 is the user's ID, and 245 is the form ID.其中 23445 是 GymID,53894302 是用户 ID,245 是表单 ID。 You might even move the form ID to the sort key and along with a date, you could have an SK of form::245::.您甚至可以将表单 ID 移动到排序键和日期,您可以有一个表单::245:: 的 SK。 Then you could easily get all items of type form for that user, or all form 245s for that user.然后,您可以轻松获取该用户的所有表单类型项目,或该用户的所有表单 245。 or all form 245s in 2020 for that user, by using the begins_with() expression in your QUERY.或该用户在 2020 年的所有表格 245,方法是在您的 QUERY 中使用 begin_with() 表达式。

This might not be an exactly what you should do, but play with it and see what options you come up with.这可能不完全是你应该做的,但玩它,看看你想出了什么选项。 One thing to think about is what happens when a user moves gyms?需要考虑的一件事是当用户移动健身房时会发生什么? perhaps in that rare event, you rewrite all of their items in the DB with the new gymID.也许在这种罕见的情况下,您使用新的gymID 重写他们在数据库中的所有项目。 Perhaps you do not have the gymID in the PK.可能你PK中没有gymID。 without a lot more info, it is difficult to say.没有更多信息,很难说。 Hopefully this is enough for you to chew on so you can come up a solution.希望这足以让您仔细研究,以便您提出解决方案。

Every call that writes to DDB consumes at least 1 (standard) or 2 (transactional) WCUs.写入 DDB 的每个调用至少消耗 1 个(标准)或 2 个(事务性)WCU。 Assuming your items are less the 1KB in size.假设您的项目大小小于 1KB。

See Provisioned Throughput key point请参阅预置吞吐量关键点

Item sizes for writes are rounded up to the next 1 KB multiple.写入的项目大小向上舍入到下一个 1 KB 倍数。 For example, writing a 500-byte item consumes the same throughput as writing a 1 KB item.例如,写入 500 字节的项目消耗的吞吐量与写入 1 KB 的项目相同。

So writing 4 items in one second will require 4 WCU.所以在一秒钟内写 4 个项目将需要 4 个 WCU。 But "burst" mode means you might temporarily be able to write 4 items a second for a short period of time in a table that's only provisioned for 2 WCU.但是“突发”模式意味着您可能暂时能够在仅为 2 个 WCU 配置的表中在短时间内每秒写入 4 个项目。

As far as your proposed options.就您提出的选项而言。 It depends.这取决于。 You mentioned some general access patterns, but not specifics nor if those are the only ones you need.您提到了一些一般的访问模式,但没有具体说明,也没有提到这些是您唯一需要的。

In an RDBMS, you have to know ahead of time how you want to store the data.在 RDBMS 中,您必须提前知道要如何存储数据。 But accessing that data is very flexible.但是访问这些数据非常灵活。

In DDB, you have to know how you need to access the data, but the storage structure is flexible.在 DDB 中,您必须知道您需要如何访问数据,但存储结构是灵活的。

Some general feedback:一些一般性反馈:

  • Scan() is an operation of last resort, it should be used very, very infrequently if at all. Scan() 是不得已而为之的操作,如果有的话,应该非常非常少地使用它。
  • 50K records for a partition key isn't a big deal.分区键的 50K 记录没什么大不了的。 What matters (but less than it used to) is how distributed your accesses are to each partition key.重要(但比过去少)是您的访问对每个分区键的分布情况。 Ideally, you want a uniform distribution of access across all your partition keys.理想情况下,您希望在所有分区键上均匀分布访问。
  • one table per gym is a valid multi-tenant strategy .每个健身房一张桌子是一种有效的多租户策略 But there's management/overhead costs.但是有管理/间接费用。

Assuming you actually have mutli-tenants, ie.假设您实际上有多个租户,即。 each gym is an individual customer.每个健身房都是一个单独的客户。 Then I'd lean toward having gymid be the hash key so I could take advantage of enforcing tenant isolation via IAM roles as outlined in this article .然后我倾向于让gymid成为hash密钥,这样我就可以利用本文中概述的通过IAM角色强制执行租户隔离。
Cavet: This could be problematic if tenants are NOT of approximately the same size.警告:如果租户的规模大致相同,这可能会出现问题。 But again, less of a problem than it originally was.但同样, 问题比原来少。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM