简体   繁体   中英

Polling items from DynamoDB

AWS newbie here.

I have a DynamoDB table and 2+ nodes of Java apps reading/writing from/to it. My use case is as follow: the app should fetch N numbers of items every X seconds based on a timestamp, process them, then remove them from the DB. Because the app may scale, other nodes might be reading from the DB in the same time and I want to avoid processing the same items multiple times.

The questions is: is there any way to implement something like a poll() method that fetches the item and immediately removes it (atomic operation) as if the table was a queue. As far as I checked, delete item methods that DynamoDBMapper offers do not return removed items data.

My understanding is that you want to read and delete an item in an atomic manner, however, we are aware that is not possible with DynamoDB.

However, what is possible is deleting the item and being returned the value, which is more likened to a delete then read. As you correctly pointed out, the Mapper client does not support ReturnValues however the low level clients do.

Key keyToDelete = new Key().withHashKeyElement(new AttributeValue("214141"));
DeleteItemRequest dir = new DeleteItemRequest()
    .withTableName("ABC")
    .withKey(keyToDelete)
    .withReturnValues("ALL_OLD");

More info here DeleteItemRequest

Consistency is a weak spot of DDB, but that's the price to pay for its scalability.

You said it yourself, you're looking for a queue, so why not use one?

I suggest:

  1. Create a lambda that:
    • Reads the items
    • Publishes them to an SQS FIFO queue with message deduplication
    • Deletes the items from the DB
  2. Create an EventBridge schedule to run the Lambda every n minutes
  3. Have your nodes poll that queue instead of DDB

For this to work you have to consider a few things regarding timings:

  1. DDB will typically be consistent in under a second, but this isn't guaranteed.
  2. SQS deduplication only works for 5 minutes.
  3. EventBridge only supports minute level granularity, not seconds.

So you can run your Lambda as frequently as once a minute, but you can run your nodes as frequently (or infrequently) as you like.

If you run your Lambda less frequently than every 5 minutes then there is technically a chance of processing an item twice, but this is very unlikely to ever happen (technically this could still happen anyway if DDB took >10 minutes to be consistent, but again, extremely unlikely to ever happen).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM