Replacing a scheduled task with Spring Events

Question

In my Spring Boot app, customers can submit files. Each customer's files are merged together by a scheduled task that runs every minute. The fact that the merging is performed by a scheduler has a number of drawbacks, eg it's difficult to write end-to-end tests, because in the test you have to wait for the scheduler to run before retrieving the result of the merge.

Because of this, I would like to use an event-based approach instead, ie

Customer submits a file
An event is published that contains this customer's ID
The merging service listens for these events and performs a merge operation for the customer in the event object

This would have the advantage of triggering the merge operation immediately after there is a file available to merge.

However, there are a number of problems with this approach which I would like some help with

Concurrency

The merging is a reasonably expensive operation. It can take up to 20 seconds, depending on how many files are involved. Therefore the merging will have to happen asynchronously, ie not as part of the same thread which publishes the merge event. Also, I don't want to perform multiple merge operations for the same customer concurrently in order to avoid the following scenario

Customer1 saves file2 triggering a merge operation2 for file1 and file2
A very short time later, customer1 saves file3 triggering merge operation3 for file1, file2, and file3
Merge operation3 completes saving merge-file3
Merge operation2 completes overwriting merge-file3 with merge-file2

To avoid this, I plan to process merge operations for the same customer in sequence using locks in the event listener, eg

@Component
public class MergeEventListener implements ApplicationListener<MergeEvent> {

    private final ConcurrentMap<String, Lock> customerLocks = new ConcurrentHashMap<>();

    @Override
    public void onApplicationEvent(MergeEvent event) {
        var customerId = event.getCustomerId();
        var customerLock = customerLocks.computeIfAbsent(customerId, key -> new ReentrantLock());
        customerLock.lock();
        mergeFileForCustomer(customerId);
        customerLock.unlock();
    }

    private void mergeFileForCustomer(String customerId) {
        // implementation omitted
    }
}

Fault-Tolerance

How do I recover if for example the application shuts down in the middle of a merge operation or an error occurs during a merge operation?

One of the advantages of the scheduled approach is that it contains an implicit retry mechanism, because every time it runs it looks for customers with unmerged files.

Summary

I suspect my proposed solution may be re-implementing (badly) an existing technology for this type of problem, eg JMS. Is my proposed solution advisable, or should I use something like JMS instead? The application is hosted on Azure, so I can use any services it offers.

If my solution is advisable, how should I deal with fault-tolerance?

Answer 1

Regarding the concurrency part, I think the approach with locks would work fine, if the number of files submitted per customer (on a given timeframe) is small enough.

You can eventually monitor over time the number of threads waiting for the lock to see if there is a lot of contention. If there is, then maybe you can accumulate a number of merge events (on a specific timeframe) and then run a parallel merge operation, which in fact leads to a solution similar to the one with the scheduler.

In terms of fault-tolerance, an approach based on a message queue would work (haven't worked with JMS but I see it's an implementation of a message-queue).

I would go with a cloud-based message queue ( SQS for example) simply because of reliability purposes. The approach would be:

Push merge events into the queue
The merging service scans one event at a time and it starts the merge job
When the merge job is finished, the message is removed from the queue

That way, if something goes wrong during the merge process, the message stays in the queue and it will be read again when the app is restarted.

Answer 2

My thoughts around this matter after some considerations.

I restricted possible solutions to what's available from Azure managed services, according to specifications from OP.

Azure Blob Storage Function Trigger

Because this issue is about storing files, let's start with exploring Blob Storage with trigger function that fires on file creation. According to doc, Azure functions can run up to 230 seconds, and will have a default retry count of 5.

But, this solution will require that files from a single customer arrives in a manner that will not cause concurrency issues, hence let's leave this solution for now.

Azure Queue Storage

Does not guarantee first-in-first-out (FIFO) ordered delivery, hence it does not meet the requirements.

Storage queues and Service Bus queues - compared and contrasted: https://docs.microsoft.com/en-us/azure/service-bus-messaging/service-bus-azure-and-service-bus-queues-compared-contrasted

Azure Service Bus

Azure Service Bus is a FIFO queue, and seems to meet the requirements.

https://docs.microsoft.com/en-us/azure/service-bus-messaging/service-bus-azure-and-service-bus-queues-compared-contrasted#compare-storage-queues-and-service-bus-queues

From doc above, we see that large files are not suited as message payload. To solve this, files may be stored in Azure Blob Storage, and message will contain info where to find the file.

With Azure Service Bus and Azure Blob Storage selected, let's discuss implementation caveats.

Queue Producer

On AWS, the solution for the producer side would have been like this:

Dedicated end-point provides pre-signed URL to customer app
Customer app uploads file to S3
Lambda triggered by S3 object creation inserts message to queue

Unfortunately, Azure doesn't have a pre-signed URL equivalent yet (they have Shared Access Signature which is not equal), hence file uploads must be done through an end-point which in turn stores the file to Azure Blob Storage. When file upload end-point is required, it seems appropriate to let the file upload end-point also be reponsible for inserting messages into queue.

Queue Consumer

Because file merging takes a signicant amount of time (~ 20 secs), it should be possible to scale out the consumer side. With multiple consumers, we'll have to make sure that a single customer is processed by no more than one consumer instance. This can be solved by using message sessions: https://docs.microsoft.com/en-us/azure/service-bus-messaging/message-sessions

In order to achieve fault tolerance, consumer should use peek-lock (as opposed to receive-and-delete) during file merge and mark message as completed when file merge is completed. When message is marked as completed, consumer may be responsible for removing superfluous files in Blob Storage.

Possible problems with both existing solution and future solution

If customer A starts uploading a huge file #1 and immediately after that starts uploading a small file #2 , file upload of file #2 may be be completed before file #1 and cause an out-of-order situation.

I assume that this is an issue that is solved in existing solution by using some kind of locking mechanism or file name convention.

Answer 3

Spring-boot with Kafka can solve your problem of fault tolerance.

Kafka supports the producer-consumer model. let the customer events posted to Kafka producer.

configure Kafka with replication for not to lose any events.

use consumers that can invoke the Merging service for each event.

once the consumer read the event of customerId and merged then commit the offset.
In case of any failure in between merging the event, offset is not committed so it can be read again when the application started again.
If the merging service can detect the duplicate event with given data then reprocessing the same message should not cause any issue(Kafka promises single delivery of the event). Duplicate event detection is a safety check for an event processed full but failed to commit to Kafka.

Answer 4

First, event-based approach is corrrect for this scenario. You should use external broker for pub-sub event messages.

Attention that, by default, Spring publishing an event is synchronous .

Suppose that, you have 3 services:

App Service
Merge Servcie
CDC Service (change data capture)
Broker Service (Kafka, RabbitMQ,...)

Main flow base on "Outbox Pattern":

App Service save event message to Outbox message table
CDC Service watching outbox table and publish event message from Outbox table to Broker Servie
Merge Service subcribe to Broker Server and receiving event message (messages is orderly)
Merge Servcie perform merge action

You can use eventuate lib for this flow.

Futher more, you can apply DDD to your architecture. Using Axon framework for CQRS pattern, public domain event and process it.

Refer to:

Outbox pattern: https://microservices.io/patterns/data/transactional-outbox.html

Answer 5

It really sounds like you may do with a Stream or an ETL tool for the job. When you are developing an app, and you have some prioritisation/queuing/batching requirement, it is easy to see how you can build a solution with a Cron + SQL Database , with maybe a queue to decouple doing work from producing work.

This may very well be the easiest thing to build as you have a lot of granularity and control to this approach. If you believe that you can in fact meet your requirements this way fairly quickly with low risk, you can do so.

There are software components which are more tailored to these tasks, but they do have some learning curves, and depend on what PAAS or cloud you may be using. You'll get monitoring, scalability, availability resiliency out-of-the-box. An open source or cloud service will take the burden of management off your hands.

What to use will also depend on what your priority and requirements are. If you want to go the ETL approach which is great at banking up jobs you might want to use something like a Glue t. If you want to want prioritization functionality you may want to use multiple queues, it really depends. You'll also want to monitor with a dashboard to see what wait time you should have for your merge regardless of the approach.

Replacing a scheduled task with Spring Events

Question

Concurrency

Fault-Tolerance

Summary

5 answers

solution1
2 2020-12-23 17:25:18

solution2
1 2021-01-08 16:34:04

Azure Blob Storage Function Trigger

Azure Queue Storage

Azure Service Bus

solution3
0 2021-01-05 18:20:42

solution4
0 2021-01-06 09:03:03

solution5
0 2021-01-07 04:01:22

Replacing a scheduled task with Spring Events

Question

Concurrency

Fault-Tolerance

Summary

5 answers

solution1 2 2020-12-23 17:25:18

solution2 1 2021-01-08 16:34:04

Azure Blob Storage Function Trigger

Azure Queue Storage

Azure Service Bus

solution3 0 2021-01-05 18:20:42

solution4 0 2021-01-06 09:03:03

solution5 0 2021-01-07 04:01:22

solution1
2 2020-12-23 17:25:18

solution2
1 2021-01-08 16:34:04

solution3
0 2021-01-05 18:20:42

solution4
0 2021-01-06 09:03:03

solution5
0 2021-01-07 04:01:22