I'm trying to figure out how to model the following data in AWS DynamoDB table.
I have a lot of IOT devices, each sends telemetry data every few seconds.
I understand that I can add GSI's for each attribute, but I would like to use GSI's only if there is no other choice as it costs me more money.
What would be the main primary-key (partition-key:sort-key) ?
Please share you thoughts, I care about them more than I care about the perfect answer as I'm trying to learn how to think and what to consider instead of having an answer for a specific question.
Thanks a lot !
If you absolutely need the querability patterns mentioned, you have no way out but create GSIs for each. That too has its set of caveats:
incident_date
(or whatever) as partition-key and device_id
as sort-key. This might lead to hot partitioning in DynamoDB, based on your access patterns. While evaluating pros and cons of using NoSQL for a given situation, one needs to consider both read and write access patterns. So, the question you should ask is, why DynamoDB?
For eg, do you really need realtime queries? If not, you can use DynamoDB as the main database and periodically sync data (using AWS Lambda or Kinesis Firehose) to EMR or Redshift for later batch processing.
Edit : Proposed primary key:
device_id
as partition-key and incident_date
as sort-key, if you know that no 2 or more incidents, for a given device_id
, can come at exact same time. incident_id
as partition-key and incident_date
as sort-key.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.