简体   繁体   English

DynamoDB - GSI 与复制

[英]DynamoDB - GSI versus duplication

I have a question about many-to-many relationships within DynamoDB and what happens on a GSI versus shallow duplication.我有一个关于 DynamoDB 中的多对多关系的问题,以及 GSI 与浅层复制之间发生的情况。

Say I want to model the standard many-to-many within social media: a user can follow many other pages and a page has many followers.假设我想要 model 社交媒体中的标准多对多:一个用户可以关注许多其他页面,并且一个页面有很多关注者。 So, your access patterns are that you need to pull all the followers for a page and you need to see all the pages that a user follows.因此,您的访问模式是您需要为一个页面拉取所有关注者,并且您需要查看用户关注的所有页面。

If you create an item that has a primary key of the id of the page and a sort key of the user id, this lets you pull all followers for that page.如果您创建的项目具有页面 ID 的主键和用户 ID 的排序键,这可以让您拉取该页面的所有关注者。

You could them place a GSI on that item with an inverted index.您可以让他们在该项目上放置一个带有倒排索引的 GSI。 This would like you call all pages a user is following.这希望您调用用户正在关注的所有页面。

What exactly is happening there?那里到底发生了什么? Is DynamoDB duplicating that data somewhere with the keys rearranged? DynamoDB 是否在某个地方复制了重新排列键的数据? Is this any different that just creating a second item in the table with a primary key of the user and the sort key of the page?这与仅使用用户的主键和页面的排序键在表中创建第二个项目有什么不同吗?

So, you have this item:所以,你有这个项目:

Item 1:
PK                       SK
FOLLOWEDPAGE#<PageID>    USER#<UserId>

And you can create a GSI and invert SK and PK, or you could simply create this second item:您可以创建一个 GSI 并反转 SK 和 PK,或者您可以简单地创建第二个项目:

Item 2:
FOLLOWINGUSER#<UserId>   PAGE#<PageID>

Other than the fact that you now have to maintain this second item, how is this functionally different?除了您现在必须维护第二个项目之外,这在功能上有何不同?

Does a GSI duplicate items with that index? GSI 是否会复制具有该索引的项目? Does it duplicate items without that index?它会复制没有该索引的项目吗?

Is DynamoDB duplicating that data somewhere with the keys rearranged? DynamoDB 是否在某个地方复制了重新排列键的数据?

Yes, a secondary index is an opaque copy of your data.是的,二级索引是数据的不透明副本。 As the docs say: A secondary index is a data structure that contains a subset of attributes from a table, along with an alternate key to support Query operations .正如文档所说:二级索引是一种数据结构,它包含表中属性的子集,以及支持查询操作的备用键 You choose what data gets copied (DynamoDB speak: projected ) to the index.您选择将哪些数据复制(DynamoDB 说:投影)到索引。

Is this any different that just creating a second item in the table with a primary key of the user and the sort key of the page?这与仅使用用户的主键和页面的排序键在表中创建第二个项目有什么不同吗?

Apart from the maintenance burden you mention, conceptually they are similar.除了您提到的维护负担之外,它们在概念上是相似的。 There are some technical differences between a Global Secondary Index and DIY replication: 全球二级索引和 DIY 复制之间存在一些技术差异:

  • A GSI requires separate provisioned concurrency , although the read and write units consumed and storage costs incurred are the same for both approaches. GSI 需要单独配置的并发性,尽管这两种方法消耗的读取和写入单位以及产生的存储成本相同。
  • A GSI is eventually consistent . GSI 是最终一致的。
  • A Scan operation will be ~2x worse with the DIY approach, because the table is ~2x bigger.使用 DIY 方法,扫描操作会差 2 倍左右,因为表大 2 倍左右。

See the Best practices for using secondary indexes in DynamoDB for optimization patterns.请参阅在 DynamoDB 中使用二级索引的最佳实践以了解优化模式。

Does a GSI duplicate items with that index? GSI 是否会复制具有该索引的项目?

Yes.是的。

Does it duplicate items without that index?它会复制没有该索引的项目吗?

No.不。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM