简体   繁体   中英

activemq - wait for all messages to be consumed

I have a case where there's a bulk action that is processing multiple items. After all items are processed I have to update the action status to completed. Items are processed in parallel by multiple consumers.

In theory a consumer after processing an item could check if there are no items left (or no messages in the queue for this action), but it's possible, that two consumers (A and B) finish at the same time, they both check at the same time and they both see that the other one is still not ready (because the transaction is not yet committed) - consumer A will not see changes done by consumer B and consumer B will not see changes done by consumer A, so none of them would update action status. Am I right?

How to implement such condition without some kind of additional periodic check of the status and without its overhead? Periodic check might be good if there are thousands of items per action, but if there are usually 1-2 long-running ones it's very inefficient.

Thanks!

edit: in short - what is the correct approach to trigger some action after processing a set of messages, but:

  • messages must be processed in parallel
  • periodic checking if all messages were processed is not the answer

You can have each consumer enqueue a are-we-done-yet message onto a new queue when it finishes processing (and before it commits its own transaction). The are-we-done-yet messages should go into a queue with a single consumer; when this consumer processes a message it checks to see if the original queue is empty. This has the effect of serializing the check, and resolving the issue that was originally caused by the parallelism.

This isn't the most efficient approach in the general case, but since you mentioned that you have just 1-2 long running items it may work for you. I've done this before in a similar situation and it works quite well.

You want a first class batching facility. Relying simply on the size of the queue is not reliable enough. For example, you can have a process working on a message and then reject it, thus placing message back on the queue (which was previously "empty").

Rather, make the batch a first class concept. Consider sending a "batch start" message that contains the number of items in the batch. Then as messages are processed, they can update a batch status record, or some other device. The batch status can track the number of messages processed, number that passed, number that failed, etc.

When the last message is processed, it can check the batch status to see if it's the "last message" by seeing that the messages processed count matches the batch count "minus 1" (since it's running the last message).

You'll want to make this process atomic, so for example, if you're using SQL, you'll watch to fetch batch status row "FOR UPDATE", which will lock the row to your transaction and thus your comparison can be atomic.

You could also put a trigger on the row, and have it check, if that's more your style.

Or you could have a global object on your system that manages this for you. All sorts of mechanisms.

But the key is that you have some overarching batch concept to manage all of the workers. You can't do this at the individual worker level, not reliably.

As a combination of Will and Dan's answers, I'd suggest a batch administration queue where "Batch Start" messages with a batch size counter arrive, together with "Message Processed" messages, sent by the consumers when they're done processing a message.

Its single administration consumer can count the processed messages as they arrive until they match the batch size, and log that the batch is done.

To allow for error situations, you have to do a periodic check.

For example, suppose you have two consumers and one message in a queue. Consumer 1 picks up and starts to process the message. Now consumer 1 crashes unexpectedly and the transaction is rolled back. The message now needs picking up and processing by consumer 2.

Therefore consumer 2 can't exit until all the messages have been successfully processed. The only way of checking this is to check the queue size periodically until it is empty. If consumer 2 just exits when there are no more messages for it, you will end up with unprocessed messages in the queue in consumer 1 has to rollback a transaction.

Create an ActionMonitor responsible for marking the Action as finished. The different ActionConsumer instances will notify it when they are done. When the number of consumers finished is the same as the number of consumers that were running, the ActionMonitor marks the Action as finished.

With this solution, there's no need to add any extra queue or thread. The actual execution of marking the Action as finished will be performed by the same thread that consumed the last element.

It would like like this:

public void ActionMonitor {
    private int numberOfConsumers; // Total number of consumers.
    private int numberOfConsumersFinished;

    public synchronized void consumerFinished() { // Sync could be more efficient.
        numberOfConsumersFinished++;
        if(numberOfConsumers == numberOfConsumersFinished) {
            markTheActionAsFinished();
        }
    }
}

public void ActionConsumer {

    private ActionMonitor actionMonitor;

    public void processElementsInAction() {
        while(moreElementsToProcess()) {
            takeNewElementAndProcessIt();
        }
        actionMonitor.consumerFinished();
    }
}

Warning: You need to know how many Consumers will be in advance.

I hope it helps.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM