简体   繁体   中英

Adapting one-to-many relationship to DynamoDB (NoSQL)

Introduction

Hello, I'm moving to AWS because of stability, performance, etc. I will be using DynamoDB because of the always free tier that allows me to reduce my bills a lot. I was using MySQL until now. I will make the attributes simple for this example (to show the actual places where I need help and make the question shorter).

My actual DB has less than 5k rows and I expect it to grow to 20-30k in 2 years. Each user (without any group/order data) is around 600B. I don't know how this will translate to a NoSQL DB but I expect it to be less than 10MB.

What data will I have?

User:

  • username
  • password
  • is_group_member

Group:

  • capacity
  • access_level

Order:

  • oid
  • status
  • prod_id

Relationships:

  • User has many orders.
  • Group has many users.

How will I access the data and what will I get?

  1. I will access the user by username (I won't know the group he is in). I will need to get the user's data, the group he belongs to and its data.
  2. I will access the users that belong to a certain group. I will need to get the users' data and the group data.
  3. I will access an order by its oid . I will need to get the user it belongs to and its data.

What I tried

I watched a series of videos by Gary Jennings, read answers on SO and also read alexdebrie's article about one-to-many relationships. My problem is that I can't seem to find an alternative that suits all the ways I will access the data.

For example:

  1. Denormalization: it will leave me with a lot of duplicated data thus increasing the cost.
  2. Composite primary key: I will be able to access the users by its group but how will I access the user and the group's data without knowing the group beforehand. I would need to use 2 requests making it inefficient and increasing the costs.
  3. Secondary index + the Query API action: Again I would need to use 2 requests making it inefficient and increasing the costs.

Final questions

  • Did I misunderstood the alternatives? I started this question because my knowledge is not "big enough" to actually know if there is a better alternative that I can't think of so maybe I got the explanations wrong.
  • Is there a better alternative for this case?
  • If there wasn't a better alternative, what would you do in my case? Would you duplicate the group's data (thus increasing the used space and making it need only 1 request)? or would you use one of the other 2 alternatives and use 2 requests?

You're off to a great start by articulating your access patterns ahead of time.

Let's start by addressing some of your comments about data modeling in DynamoDB:

  1. Denormalization: it will leave me with a lot of duplicated data thus increasing the cost.

When first learning DynamoDB data modeling, prior SQL Database knowledge can really get in the way. Normalizing your data is a common practice when working with SQL databases. However, denormalizing your data is a key data modeling strategy in DynamoDB.

One BIG reason you want to denormalize your data: DynamoDB doesn't have joins. Because DDB doesn't have joins, you'll be well served to pre-join your data so it can be fetched in a single query.

This blog post does a good job of explaining why denormalization is important in DDB.

Keep in mind, storage is cheap. Denomralizing your data makes for faster data access at a relatively low cost. With the size of your database, you will likely be well under the free tier threshold. Don't stress about the duplicate data!

  1. Composite primary key: I will be able to access the users by its group but how will I access the user and the group's data without knowing the group beforehand. I would need to use 2 requests making it inefficient and increasing the costs.

Denormalizing your data will help solve this problem (eg store the group info with the user). I'll give you an example of this below.

  1. Secondary index + the Query API action: Again I would need to use 2 requests making it inefficient and increasing the costs.

You didn't share your primary key structure, so I'm not sure what scenario will require two requests. However, I will say that there may be certain situations where making two requests to DDB is a reasonable approach. Making two efficient query operations is not the end of the world.

OK, on to an example of modeling your relationships! Keep in mind that there are many ways to model data in DynamoDB. This example is not THE way. Rather, it's an example meant to demonstrate a few strategies that might help.

Here's one take of your data model:

在此处输入图片说明

With this arrangement, you can support the following access patterns:

  1. Fetch user information - PK = USER#[username] SK = USER#[username]
  2. Fetch user group - PK = USER#[username] SK begins_with GROUP#. Notice I denormalized user data in the group item. The reason for this will be apparent shortly :)
  3. Fetch user orders - PK = USER#[username] SK begins_with ORDER#
  4. Fetch all user data - PK = USER#[username]

To support your remaining access patterns, I created a secondary index. The primary key and sort key of the secondary index is swapped with the primary key/sort key of the base table. This pattern is called an inverted index . The secondary index looks like this:

GSI

This secondary index supports the following access patterns:

  • Fetch Group users - PK = GROUP#[grouped]
  • Fetch Order by oid - PK = ORDER#[oid]

You can see that I denormalized the User and Group relationship by repeating user data in the item representing the Group. This helps me with the "fetch group users" access pattern.

Again, this is just one way you can achieve the access patterns you described. There are many strategies, but many will require that you abandon some of the best practices you learned working with SQL databases!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM