简体   繁体   中英

DynamoDB top item per partition

We are new to DynamoDB and struggling with what seems like it would be a simple task.

It is not actually related to stocks (it's about recording machine results over time) but the stock example is the simplest I can think of that illustrates the goal and problems we're facing.

The two query scenarios are:

  • All historical values of given stock symbol <= We think we have this figured out
  • The latest value of all stock symbols <= We do not have a good solution here!

Assume that updates are not synchronized, eg the moment of the last update record for TSLA maybe different than for AMZN.

The 3 attributes are just { Symbol, Moment, Value }. We could make the hash_key Symbol, range_key Moment, and believe we could achieve the first query easily/efficiently.

We also assume could get the latest value for a single, specified Symbol following https://stackoverflow.com/a/12008398

The SQL solution for getting the latest value for each Symbol would look a lot like https://stackoverflow.com/a/6841644

But... we can't come up with anything efficient for DynamoDB.

Is it possible to do this without either retrieving everything or making multiple round trips?

The best idea we have so far is to somehow use update triggers or streams to track the latest record per Symbol and essentially keep that cached. That could be in a separate table or the same table with extra info like a column IsLatestForMachineKey (effectively a bool). With every insert, you'd grab the one where IsLatestForMachineKey=1, compare the Moment and if the insertion is newer, set the new one to 1 and the older one to 0.

This is starting to feel complicated enough that I question whether we're taking the right approach at all, or maybe DynamoDB itself is a bad fit for this, even though the use case seems so simple and common.

There is a way that is fairly straightforward, in my opinion.

Rather than using a GSI, just use two tables with (almost) the exact same schema. The hash key of both should be symbol . They should both have moment and value . Pick one of the tables to be stocks-current and the other to be stocks-historical . stocks-current has no range key. stocks-historical uses moment as a range key.

Whenever you write an item, write it to both tables. If you need strong consistency between the two tables, use the TransactWriteItems api.

If your data might arrive out of order, you can add a ConditionExpression to prevent newer data in stocks-current from being overwritten by out of order data.

The read operations are pretty straightforward, but I'll state them anyway. To get the latest value for everything, scan the stocks-current table. To get historical data for a stock, query the stocks-historical table with no range key condition.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM