简体   繁体   中英

DynamoDB - GSI versus duplication

I have a question about many-to-many relationships within DynamoDB and what happens on a GSI versus shallow duplication.

Say I want to model the standard many-to-many within social media: a user can follow many other pages and a page has many followers. So, your access patterns are that you need to pull all the followers for a page and you need to see all the pages that a user follows.

If you create an item that has a primary key of the id of the page and a sort key of the user id, this lets you pull all followers for that page.

You could them place a GSI on that item with an inverted index. This would like you call all pages a user is following.

What exactly is happening there? Is DynamoDB duplicating that data somewhere with the keys rearranged? Is this any different that just creating a second item in the table with a primary key of the user and the sort key of the page?

So, you have this item:

Item 1:
PK                       SK
FOLLOWEDPAGE#<PageID>    USER#<UserId>

And you can create a GSI and invert SK and PK, or you could simply create this second item:

Item 2:
FOLLOWINGUSER#<UserId>   PAGE#<PageID>

Other than the fact that you now have to maintain this second item, how is this functionally different?

Does a GSI duplicate items with that index? Does it duplicate items without that index?

Is DynamoDB duplicating that data somewhere with the keys rearranged?

Yes, a secondary index is an opaque copy of your data. As the docs say: A secondary index is a data structure that contains a subset of attributes from a table, along with an alternate key to support Query operations . You choose what data gets copied (DynamoDB speak: projected ) to the index.

Is this any different that just creating a second item in the table with a primary key of the user and the sort key of the page?

Apart from the maintenance burden you mention, conceptually they are similar. There are some technical differences between a Global Secondary Index and DIY replication:

  • A GSI requires separate provisioned concurrency , although the read and write units consumed and storage costs incurred are the same for both approaches.
  • A GSI is eventually consistent .
  • A Scan operation will be ~2x worse with the DIY approach, because the table is ~2x bigger.

See the Best practices for using secondary indexes in DynamoDB for optimization patterns.

Does a GSI duplicate items with that index?

Yes.

Does it duplicate items without that index?

No.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM