简体   繁体   中英

Modeling data in NoSQL DynamoDB

I'm trying to figure out how to model the following data in AWS DynamoDB table.

I have a lot of IOT devices, each sends telemetry data every few seconds.

Attributes

  1. device_id
  2. timestamp
  3. malware_name
  4. company_name
  5. action_performed (two possible values)

Queries

  1. Show all incidents that happened in the last week.
  2. Show all incidents for a specific device_id.
  3. Show all incidents with action "unable_to_remove".
  4. show all incidents related to specific malware.
  5. Show all incidents related to specific company.

Thoughts

  1. I understand that I can add GSI's for each attribute, but I would like to use GSI's only if there is no other choice as it costs me more money.

  2. What would be the main primary-key (partition-key:sort-key) ?

Please share you thoughts, I care about them more than I care about the perfect answer as I'm trying to learn how to think and what to consider instead of having an answer for a specific question.

Thanks a lot !

If you absolutely need the querability patterns mentioned, you have no way out but create GSIs for each. That too has its set of caveats:

  • For query #1, your GSI would be incident_date (or whatever) as partition-key and device_id as sort-key. This might lead to hot partitioning in DynamoDB, based on your access patterns.
  • There is a limit of 5 GSIs per table, that you'll use up right away. What'll you do if you need to support another kind of query in future?

While evaluating pros and cons of using NoSQL for a given situation, one needs to consider both read and write access patterns. So, the question you should ask is, why DynamoDB?

For eg, do you really need realtime queries? If not, you can use DynamoDB as the main database and periodically sync data (using AWS Lambda or Kinesis Firehose) to EMR or Redshift for later batch processing.

Edit : Proposed primary key:

  • device_id as partition-key and incident_date as sort-key, if you know that no 2 or more incidents, for a given device_id , can come at exact same time.
  • If above doesn't work, then incident_id as partition-key and incident_date as sort-key.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM