简体   繁体   English

如何对 dynamodb 表进行全局查询?

[英]How to do global query on dynamodb table?

Dynamodb is a name - value database and query requires a partition key. Dynamodb 是一个name - value数据库,查询需要一个分区键。 I am saving user data in Dyanmodb table but I'd like to know what the best way to do a global search.我将用户数据保存在 Dyanmodb 表中,但我想知道进行全局搜索的最佳方法是什么。

My table includes these fields:我的表格包括以下字段:

id (PK)
firstName
lastName
email
phone
company ( GSI PK )

The id is the parition key for the table and company is the PK for a gsi. id是表的分区键, company是 gsi 的 PK。 I usually use the id to query individual user and use the company gsi to queryusers under a company with pagination.我一般用id查询个人用户,用company gsi查询一个公司下的用户,分页。

Now I get a requirement to query users globally (without any company).现在我需要在全球范围内查询用户(没有任何公司)。 I am not sure how I should support this since it doesn't have any PK in the query.我不确定我应该如何支持它,因为它在查询中没有任何 PK。 And I don't want to use scan since it is too expensive.而且我不想使用扫描,因为它太贵了。

One solution I can think of is to create a separate field which has a fixed value for all items.我能想到的一种解决方案是创建一个单独的字段,该字段对所有项目都有固定值。 And create a GIS on this field.并在该领域创建一个 GIS。 In this way, I can use the fixed value as PK to query all users.这样我就可以用固定值作为PK查询所有用户了。 But it will create a hot partition in the table I want to avoid.但它会在我想避免的表中创建一个热分区。 Is there any other way to do that?还有其他方法吗?

One solution I can think of is to create a separate field which has a fixed value for all items.我能想到的一种解决方案是创建一个单独的字段,该字段对所有项目都有固定值。 And create a GIS on this field.并在该领域创建一个 GIS。 In this way, I can use the fixed value as PK to query all users.这样我就可以用固定值作为PK查询所有用户了。 But it will create a hot partition in the table I want to avoid.但它会在我想避免的表中创建一个热分区。 Is there any other way to do that?还有其他方法吗?

You're on the right track here, the risk of a hot partition is also spot-on.您在这里走在正确的轨道上,热分区的风险也很明显。 For the solution, we can make use of bucketing.对于解决方案,我们可以使用分桶。

First, I understand that your access pattern looks something like this: getUserByUsername(username: str) .首先,我知道您的访问模式看起来像这样: getUserByUsername(username: str)

That means you know the username you're looking for.这意味着您知道要查找的用户名。 In order to solve the problem of a hot partition, you could calculate a separate partition key value ( gsi2_pk ) for the GSI based on the username, eg take the first two characters.为了解决热分区问题,您可以根据用户名为GSI 计算一个单独的分区键值( gsi2_pk ),例如取前两个字符。

That means the table layout could be something like this:这意味着表格布局可能是这样的:

gsi2_pk gsi2_pk gsi2_sk gsi2_sk
jo joey yi zhao乔伊赵
jo johnny b goode约翰尼古德
ma maurice莫里斯

This way you distribute your users across a lot more partitions.通过这种方式,您可以将用户分布到更多的分区中。

The drawback here is that your usernames probably wouldn't be evenly distributed across the buckets and you may inadvertently create more hot partitions.这里的缺点是您的用户名可能不会均匀分布在存储桶中,您可能会无意中创建更多热分区。 Another approach would be to have a fixed number of buckets ( n ) and you put set gsi2_pk to hash(username) % n , which will more evenly distribute the items.另一种方法是拥有固定数量的桶 ( n ),然后将 set gsi2_pk设置为hash(username) % n ,这将更均匀地分配项目。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM