简体   繁体   中英

DynamoDB - Design 1 to Many relationship

I'm new at DynamoDB technologies but not at NoSQL (I've already done some project using Firebase).

Read that a DynamoDB best practice is one table per application I've been having a hard time on how to design my 1 to N relationship.

I have this entity (pseudo-json):

{
   machineId: 'HASH_ID'
   machineConfig: /* a lot of fields */
}

A machineConfig is unique for each machine and can change rarely and only by an administration (no consistency issue here).

The issue is that I have to manage a log of data from the sensors of each machine. The log is described as:

{
  machineId: 'HASH_ID',
  sensorsData: [
    /* Huge list of: */
    { timestamp: ..., data: /* lot of fields */ },
    ...
  ]
}

I want to keep my machineConfig in one place. Log list can't be insert into the machine entity because it's a continuous stream of data taken over time.

Furthermore, I don't understand which could be the composite key, the partition key obviously is the machineId , but what about the order key?

How to design this relationship taking into account the potential dimensions of data?

You could do this with 1 table. The primary key could be (machineId, sortKey) where machineId is the partition key and sortKey is a string attribute that is going to be used to cover the 2 cases. You could probably come up with a better name.

To store the machineConfig you would insert an item with primary key (machineId, "CONFIG") . The sortKey attribute would have the constant value CONFIG .

To store the sensorsData you could use the timestamp as the sortKey value. You would insert a new item for each piece of sensor data. You would store the timestamp as a string (as time since the epoch, ISO8601, etc)

Then to query everything about a machine you would run a Dynamo query specifying just the machineId partition key - this would return many items including the machineConfig and the sensor data.

To query just the machineConfig you would run a Dynamo query specifying the machineId partition key and the constant CONFIG as the sortKey value

To query the sensor data you could specify an exact timestamp or a timestamp range for the sortKey . If you need to query the sensor data by other values then this design might not work as well.

Editing to answer follow up question:

You would have to resort to a scan with a filter to return all machines with their machineId and machineConfig . If you end up inserting a lot of sensor data then this will be a very expensive operation to perform as Dynamo will look at every item in the table. If you need to do this you have a couple of options.

If there are not a lot of machines you could insert an item with a primary key like ("MACHINES", "ALL") and a list of all the machineIds . You would query on that key to get the list of machineIds , then you would do a bunch of queries (or a batch get) to retrieve all the related machineConfigs . However since the max Dynamo item size is 400KB you might not be able to fit them all.

If there are too many machines to fit in one item you could alter the above approach a bit and have ("MACHINES", $machineIdSubstring) as a primary key and store chunks of machineIds under each sort key. For example, all machineIds that start with 0 go in ("MACHINES", "0") . Then you would query by each primary key 0-9, build a list of all machineIds and query each machine as above.

Alternatively, you don't have to put everything in 1 table - it is just a guideline that fits a lot of use cases. If there are too many machines to fit in less than 400KB but there aren't tens of thousands and you aren't trying to query all of them all the time, you could have a separate table of machineId and machineConfig that you resort to scanning when necessary.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM