简体   繁体   中英

Amazon DynamoDB Single Table Design For Blog Application

New to this community. I need some help in designing the Amazon Dynamo DB table for my personal projects.

Overview, this is a simple photo gallery application with following attributes.

  1. UserID
  2. PostID
  3. List item
  4. S3URL
  5. Caption
  6. Likes
  7. Reports
  8. UploadTime

I wish to perform the following queries:

  1. For a given user, fetch 'N' most recent posts
  2. For a given user, fetch 'N' most liked posts
  3. Give 'N' most recent posts (Newsfeed)
  4. Give 'N' most liked posts (Newsfeed)

My solution:

Keeping UserID as the partition key, PostID as the sort key, likes and UploadTime as the local secondary index, I can solve the first two query.

I'm confused on how to perform query operation for 3 and 4 (Newsfeed). I know without partition ket I cannot query and scan is not an effective solution. Any workaround for operatoin 3 and 4?

Any idea on how should I design my DB?

You could add two Global Secondary Indexes.

For 3):

Create a static attribute type with the value post , which serves as the Partition Key for the GSI and use the attribute UploadTime as the Sort Key. You can then query for type="post" and get the most recent items based on the sort key.

The solution for 4) is very similar:

Create another Global secondary index with the aforementioned item type as the partition key and Likes as the sort key. You can then query in a similar way as above. Note, that GSIs are eventually consistent, so it may take time until your like counters are updated.

Explanation and additional infos

Using this approach you group all posts in a single item collection, which allows for efficient queries. To save on storage space and RCUs, you can also choose to only project a subset of attributes into the index.

If you have more than 10GB of post-data, this design isn't ideal, but for a smaller application it will work fine.

If you're going for a Single Table Design, I'd recommend to use generic names for the Index attributes: PK , SK , GSI1PK , GSI1SK , GSI2PK , GSI2SK . You can then duplicate the attribute values into these items. This will make it less confusing if you store different entities in the table. Adding a type column that holds the entity type is also common.

It looks like you're off to a great start with your current design, well done!

For access pattern #3, you want to fetch the most recent posts. One way to approach this is to create a global secondary index (GSI) to aggregate posts by their creation time. For example, you could create a variable named GSI1PK on your main table and assign it a value of POSTS and use the upload_time field as the sort key. That would look something like this:

POSTS 全局二级索引

Viewing the secondary index (I've named it GSI1), your data would look like this:

GSI 帖子视图

This would allow you to query for Posts and sort by upload_time. This is a great start. However, your POSTS partition will grow quite large over time. Instead of choosing POSTS as the partition key for your secondary index, consider using a truncated timestamp to group posts by date. For example, here's how you could store posts by the month they were created:

带有截断时间戳的 GSI 帖子

Storing posts using a truncated timestamp will help you distribute your data across partitions, which will help your DB scale. If a month is too long, you could use truncated timestamps for a week/day/hour/etc. Whatever makes sense.

To fetch the N most recent posts, you'd simply query your secondary index for POSTS in the current month (eg POSTS#2021-01-00). If you don't get enough results, run the same query against the prior month (eg POSTS#2020-12-00). Keep doing this until your application has enough posts to show the client.

For the fourth access pattern, you'd like to fetch the most liked posts. One way to implement this access pattern is to define another GSI with "LIKES" as the partition key and the number of likes as the sort key.

按赞数发帖

If you intend on introducing a data range on the number of likes (eg most popular posts this week/month/year/etc) you could utilize the truncated timestamp approach I outlined for the previous access pattern.

When you find yourself "fetch most recent" access patterns, you may want to check out KSUIDs . KSUIDs, or K-sortable Universal Identifier, are unique identifiers that are sortable by their creation date/time/. Think of them as UUID's and timestamps combined into one attribute. This could be useful in supporting your first access pattern where you are fetching most recent posts for a user. If you were to use a KSUID for the Post ID, your table would look like this:

KSUID

I've replaced the POST ID's in this example with KSUIDs. Because the KSUIDs are unique and sortable by the time they were created, you are able to support your first access pattern without any additional indexing.

There are KSUID libraries for most popular programming languages, so implementing this feature is pretty simple.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM