简体繁体 English

DynamoDB中分区密钥的优化

[英]Optimization of Partition Key in DynamoDB

原文 2016-04-27 13:15:49 3 2 amazon-web-services/ amazon-dynamodb/ nosql

I have Messages table in DynamoDB. 我在DynamoDB中有消息表。 It has four columns sender, timestamp, message, recipient. 它具有四列发件人，时间戳，消息，收件人。 I was wondering instead of creating a partition key using any one of the four columns, why not create another columns for partitioning purposes concatenating sender&timestamp&recipient. 我想知道不是使用四列中的任何一列创建分区键，而是为什么不创建用于划分目的的另一列，以连接sender＆timestamp＆recipient。

So this column will hold data like JohnSmithID1461754484307SallyMcDonaldID. 因此，此列将保存像JohnSmithID1461754484307SallyMcDonaldID这样的数据。

By doing this, when searching for message from a particular sender&recipient combo, I can query by just using this one column using query like (begin with & end with). 这样，当从特定的发件人和收件人组合中搜索消息时，我可以使用查询（以＆开头为）仅使用这一列进行查询。 And there are a few other ways of utilizing this column. 还有其他几种利用此列的方法。

Question 1. Am I being over complicating things here by trying to use one column instead of spreading my query into a few columns? 问题1.我是否在试图通过使用一列而不是将查询扩展到几列来使事情复杂化？

Question 2. Is there a noticeable performance benefit by taking this direction? 问题2.遵循这个方向是否会带来明显的绩效收益？

Question 3. Is this design pattern only worthwhile if I eliminate column SenderId & RecipientID for data size purposes? 问题3.如果出于数据大小目的而删除了列SenderId和RecipientID，那么仅此设计模式值得吗？ (I need timestamp column for sort key) （我需要时间戳列作为排序键）

2 个解决方案

I think you have to read again how DynamoDB partition keys work . 我认为您必须再次阅读DynamoDB分区键的工作方式。 You are not able to do queries like "begin with" or "end with" on partition keys because you have to provide the full partition key for a query. 您无法在分区键上执行“开头为”或“结尾为”之类的查询，因为您必须为查询提供完整的分区键。 You may only provide such a condition on the sort key (but note that there is a begins_with function, but no ends_with function). 您只能在sort键上提供这样的条件（但请注意，这里有一个begins_with函数，但没有 ends_with函数）。

Your idea might be based on using scans instead of queries but (regarding question 2.) this would result in a lot more used capacity and bad performance because DynamoDB has to take a look at every item in the table. 您的想法可能基于使用扫描而不是查询，但是（关于问题2），这将导致更多的使用容量和较差的性能，因为DynamoDB必须查看表中的每个项目。 If you want to have more query flexibility you could define one or more secondary indexes . 如果要具有更大的查询灵活性，则可以定义一个或多个二级索引。

You can answer question 3 by yourself: DynamoDB volume is quite expensive but we are talking about a difference of maybe 20 byte per entry. 您可以自己回答问题3：DynamoDB的体积相当昂贵，但我们正在谈论的是每个条目可能有20字节的差异。 If you may end up with >10.000.000 entries in your table this might become an issue, otherwise ignore it. 如果表中最终可能有> 10.000.000个条目，则可能会成为问题，否则请忽略它。

Your particular example will not work, because you cannot have conditions on the Partition Key when querying. 您的特定示例将不起作用，因为查询时无法对分区键设置条件。 You can only have such conditions on the Sort Key. 您只能在“排序键”上具有此类条件。

Although, this sort of structure might come in handy at times. 虽然，这种结构有时可能会派上用场。 An example would be if you have three attributes that you want to query by. 例如，如果您要查询三个属性。 DynamoDB allows for at most two (Partition Key + Sort Key), so one of them could be a combination of two or more attributes in that case. DynamoDB最多允许两个（分区键+排序键），因此在这种情况下，其中一个可以是两个或多个属性的组合。