如何对 dynamodb 表进行全局查询？

Question

Dynamodb is a name - value database and query requires a partition key. Dynamodb 是一个name - value数据库，查询需要一个分区键。 I am saving user data in Dyanmodb table but I'd like to know what the best way to do a global search.我将用户数据保存在 Dyanmodb 表中，但我想知道进行全局搜索的最佳方法是什么。

My table includes these fields:我的表格包括以下字段：

id (PK)
firstName
lastName
email
phone
company ( GSI PK )

The id is the parition key for the table and company is the PK for a gsi. id是表的分区键， company是 gsi 的 PK。 I usually use the id to query individual user and use the company gsi to queryusers under a company with pagination.我一般用id查询个人用户，用company gsi查询一个公司下的用户，分页。

Now I get a requirement to query users globally (without any company).现在我需要在全球范围内查询用户（没有任何公司）。 I am not sure how I should support this since it doesn't have any PK in the query.我不确定我应该如何支持它，因为它在查询中没有任何 PK。 And I don't want to use scan since it is too expensive.而且我不想使用扫描，因为它太贵了。

One solution I can think of is to create a separate field which has a fixed value for all items.我能想到的一种解决方案是创建一个单独的字段，该字段对所有项目都有固定值。 And create a GIS on this field.并在该领域创建一个 GIS。 In this way, I can use the fixed value as PK to query all users.这样我就可以用固定值作为PK查询所有用户了。 But it will create a hot partition in the table I want to avoid.但它会在我想避免的表中创建一个热分区。 Is there any other way to do that?还有其他方法吗？

Answer 1

One solution I can think of is to create a separate field which has a fixed value for all items.我能想到的一种解决方案是创建一个单独的字段，该字段对所有项目都有固定值。 And create a GIS on this field.并在该领域创建一个 GIS。 In this way, I can use the fixed value as PK to query all users.这样我就可以用固定值作为PK查询所有用户了。 But it will create a hot partition in the table I want to avoid.但它会在我想避免的表中创建一个热分区。 Is there any other way to do that?还有其他方法吗？

You're on the right track here, the risk of a hot partition is also spot-on.您在这里走在正确的轨道上，热分区的风险也很明显。 For the solution, we can make use of bucketing.对于解决方案，我们可以使用分桶。

First, I understand that your access pattern looks something like this: getUserByUsername(username: str) .首先，我知道您的访问模式看起来像这样： getUserByUsername(username: str) 。

That means you know the username you're looking for.这意味着您知道要查找的用户名。 In order to solve the problem of a hot partition, you could calculate a separate partition key value ( gsi2_pk ) for the GSI based on the username, eg take the first two characters.为了解决热分区问题，您可以根据用户名为GSI 计算一个单独的分区键值（ gsi2_pk ），例如取前两个字符。

That means the table layout could be something like this:这意味着表格布局可能是这样的：

gsi2_pk gsi2_pk	gsi2_sk gsi2_sk
jo乔	joey yi zhao乔伊赵
jo乔	johnny b goode约翰尼古德
ma嘛	maurice莫里斯

This way you distribute your users across a lot more partitions.通过这种方式，您可以将用户分布到更多的分区中。

The drawback here is that your usernames probably wouldn't be evenly distributed across the buckets and you may inadvertently create more hot partitions.这里的缺点是您的用户名可能不会均匀分布在存储桶中，您可能会无意中创建更多热分区。 Another approach would be to have a fixed number of buckets ( n ) and you put set gsi2_pk to hash(username) % n , which will more evenly distribute the items.另一种方法是拥有固定数量的桶 ( n )，然后将 set gsi2_pk设置为hash(username) % n ，这将更均匀地分配项目。

如何对 dynamodb 表进行全局查询？

问题描述

1 个解决方案

解决方案1
0 已采纳 2022-04-14 15:40:22

如何对 dynamodb 表进行全局查询？

问题描述

1 个解决方案

解决方案1 0 已采纳 2022-04-14 15:40:22

解决方案1
0 已采纳 2022-04-14 15:40:22