简体   繁体   English

用 Spring 事件替换计划任务

[英]Replacing a scheduled task with Spring Events

In my Spring Boot app, customers can submit files.在我的 Spring Boot 应用程序中,客户可以提交文件。 Each customer's files are merged together by a scheduled task that runs every minute.每个客户的文件都通过每分钟运行的计划任务合并在一起。 The fact that the merging is performed by a scheduler has a number of drawbacks, eg it's difficult to write end-to-end tests, because in the test you have to wait for the scheduler to run before retrieving the result of the merge.由调度程序执行合并的事实有许多缺点,例如很难编写端到端测试,因为在测试中您必须等待调度程序运行才能检索合并结果。

Because of this, I would like to use an event-based approach instead, ie因此,我想改用基于事件的方法,即

  1. Customer submits a file客户提交文件
  2. An event is published that contains this customer's ID发布包含此客户 ID 的事件
  3. The merging service listens for these events and performs a merge operation for the customer in the event object合并服务监听这些事件并在事件 object 中为客户执行合并操作

This would have the advantage of triggering the merge operation immediately after there is a file available to merge.这将具有在有文件可用于合并后立即触发合并操作的优点。

However, there are a number of problems with this approach which I would like some help with但是,这种方法存在许多问题,我需要一些帮助

Concurrency并发

The merging is a reasonably expensive operation.合并是一个相当昂贵的操作。 It can take up to 20 seconds, depending on how many files are involved.最多可能需要 20 秒,具体取决于所涉及的文件数量。 Therefore the merging will have to happen asynchronously, ie not as part of the same thread which publishes the merge event.因此,合并必须异步进行,即不能作为发布合并事件的同一线程的一部分。 Also, I don't want to perform multiple merge operations for the same customer concurrently in order to avoid the following scenario另外,我不想同时为同一个客户执行多个合并操作,以避免出现以下情况

  1. Customer1 saves file2 triggering a merge operation2 for file1 and file2客户 1 保存文件 2 触发文件 1 和文件 2 的合并操作 2
  2. A very short time later, customer1 saves file3 triggering merge operation3 for file1, file2, and file3很短的时间后,客户 1 保存了文件 3,触发了文件 1、文件 2 和文件 3 的合并操作 3
  3. Merge operation3 completes saving merge-file3合并操作3完成保存合并文件3
  4. Merge operation2 completes overwriting merge-file3 with merge-file2合并操作2完成用merge-file2覆盖merge-file3

To avoid this, I plan to process merge operations for the same customer in sequence using locks in the event listener, eg为避免这种情况,我计划使用事件侦听器中的锁按顺序处理同一客户的合并操作,例如

@Component
public class MergeEventListener implements ApplicationListener<MergeEvent> {

    private final ConcurrentMap<String, Lock> customerLocks = new ConcurrentHashMap<>();

    @Override
    public void onApplicationEvent(MergeEvent event) {
        var customerId = event.getCustomerId();
        var customerLock = customerLocks.computeIfAbsent(customerId, key -> new ReentrantLock());
        customerLock.lock();
        mergeFileForCustomer(customerId);
        customerLock.unlock();
    }

    private void mergeFileForCustomer(String customerId) {
        // implementation omitted
    }
}

Fault-Tolerance容错

How do I recover if for example the application shuts down in the middle of a merge operation or an error occurs during a merge operation?例如,如果应用程序在合并操作期间关闭或在合并操作期间发生错误,我该如何恢复?

One of the advantages of the scheduled approach is that it contains an implicit retry mechanism, because every time it runs it looks for customers with unmerged files.计划方法的优点之一是它包含隐式重试机制,因为每次运行时它都会查找具有未合并文件的客户。

Summary概括

I suspect my proposed solution may be re-implementing (badly) an existing technology for this type of problem, eg JMS.我怀疑我提出的解决方案可能正在(严重)重新实现此类问题的现有技术,例如 JMS。 Is my proposed solution advisable, or should I use something like JMS instead?我建议的解决方案是可取的,还是应该改用 JMS 之类的东西? The application is hosted on Azure, so I can use any services it offers.该应用程序托管在 Azure 上,因此我可以使用它提供的任何服务。

If my solution is advisable, how should I deal with fault-tolerance?如果我的解决方案可取的,我应该如何处理容错?

Regarding the concurrency part, I think the approach with locks would work fine, if the number of files submitted per customer (on a given timeframe) is small enough.关于并发部分,如果每个客户(在给定时间范围内)提交的文件数量足够小,我认为使用锁的方法可以正常工作。

You can eventually monitor over time the number of threads waiting for the lock to see if there is a lot of contention.随着时间的推移,您最终可以监控等待锁定的线程数,以查看是否存在大量争用。 If there is, then maybe you can accumulate a number of merge events (on a specific timeframe) and then run a parallel merge operation, which in fact leads to a solution similar to the one with the scheduler.如果有,那么也许您可以累积一些合并事件(在特定时间范围内),然后运行并行合并操作,这实际上导致了类似于调度程序的解决方案。

In terms of fault-tolerance, an approach based on a message queue would work (haven't worked with JMS but I see it's an implementation of a message-queue).在容错方面,基于消息队列的方法可以工作(没有与 JMS 一起使用,但我看到它是消息队列的实现)。

I would go with a cloud-based message queue ( SQS for example) simply because of reliability purposes.我会 go 与基于云的消息队列(例如SQS )仅仅因为可靠性的目的。 The approach would be:方法是:

  • Push merge events into the queue将合并事件推送到队列中
  • The merging service scans one event at a time and it starts the merge job合并服务一次扫描一个事件并启动合并作业
  • When the merge job is finished, the message is removed from the queue合并作业完成后,从队列中删除消息

That way, if something goes wrong during the merge process, the message stays in the queue and it will be read again when the app is restarted.这样,如果在合并过程中出现问题,消息将保留在队列中,并在应用程序重新启动时再次读取。

My thoughts around this matter after some considerations.经过一番考虑,我对这个问题的想法。

I restricted possible solutions to what's available from Azure managed services, according to specifications from OP.根据 OP 的规范,我将可能的解决方案限制为 Azure 托管服务提供的解决方案。

Azure Blob Storage Function Trigger Azure Blob 存储 Function 触发器

Because this issue is about storing files, let's start with exploring Blob Storage with trigger function that fires on file creation.因为这个问题是关于存储文件的,所以让我们从使用在文件创建时触发的触发器 function 探索 Blob 存储开始。 According to doc, Azure functions can run up to 230 seconds, and will have a default retry count of 5.根据文档,Azure 函数最多可以运行 230 秒,并且默认重试次数为 5。

But, this solution will require that files from a single customer arrives in a manner that will not cause concurrency issues, hence let's leave this solution for now.但是,此解决方案将要求来自单个客户的文件以不会导致并发问题的方式到达,因此让我们暂时保留此解决方案。

Azure Queue Storage Azure 队列存储

Does not guarantee first-in-first-out (FIFO) ordered delivery, hence it does not meet the requirements.不保证先进先出 (FIFO) 有序交付,因此不符合要求。

Storage queues and Service Bus queues - compared and contrasted: https://docs.microsoft.com/en-us/azure/service-bus-messaging/service-bus-azure-and-service-bus-queues-compared-contrasted存储队列和服务总线队列 - 比较和对比: https://docs.microsoft.com/en-us/azure/service-bus-messaging/service-bus-azure-and-service-bus-queues-compared-contrasted

Azure Service Bus Azure 服务总线

Azure Service Bus is a FIFO queue, and seems to meet the requirements. Azure Service Bus是一个 FIFO 队列,似乎满足要求。

https://docs.microsoft.com/en-us/azure/service-bus-messaging/service-bus-azure-and-service-bus-queues-compared-contrasted#compare-storage-queues-and-service-bus-queues https://docs.microsoft.com/en-us/azure/service-bus-messaging/service-bus-azure-and-service-bus-queues-compared-contrasted#compare-storage-queues-and-service-总线队列

From doc above, we see that large files are not suited as message payload.从上面的文档中,我们看到大文件不适合作为消息负载。 To solve this, files may be stored in Azure Blob Storage, and message will contain info where to find the file.为了解决这个问题,可以将文件存储在 Azure Blob Storage 中,并且消息将包含在哪里可以找到文件的信息。


With Azure Service Bus and Azure Blob Storage selected, let's discuss implementation caveats.选择Azure 服务总线Azure Blob 存储后,让我们讨论实施注意事项。

Queue Producer队列生产者

On AWS, the solution for the producer side would have been like this:在 AWS 上,生产者端的解决方案是这样的:

  1. Dedicated end-point provides pre-signed URL to customer app专用端点向客户应用程序提供预签名的 URL
  2. Customer app uploads file to S3客户应用程序将文件上传到 S3
  3. Lambda triggered by S3 object creation inserts message to queue Lambda 由 S3 object 创建触发将消息插入队列

Unfortunately, Azure doesn't have a pre-signed URL equivalent yet (they have Shared Access Signature which is not equal), hence file uploads must be done through an end-point which in turn stores the file to Azure Blob Storage.不幸的是,Azure 还没有预签名的 URL 等效项(它们具有不相等的共享访问签名),因此文件上传必须通过端点完成,该端点又将文件存储到 Z3A5150F142F898F3677 When file upload end-point is required, it seems appropriate to let the file upload end-point also be reponsible for inserting messages into queue.当需要文件上传端点时,让文件上传端点也负责将消息插入队列似乎是合适的。

Queue Consumer队列消费者

Because file merging takes a signicant amount of time (~ 20 secs), it should be possible to scale out the consumer side.因为文件合并需要大量时间(约 20 秒),所以应该可以横向扩展消费者端。 With multiple consumers, we'll have to make sure that a single customer is processed by no more than one consumer instance.对于多个消费者,我们必须确保不超过一个消费者实例处理单个客户。 This can be solved by using message sessions: https://docs.microsoft.com/en-us/azure/service-bus-messaging/message-sessions这可以通过使用消息会话来解决: https://docs.microsoft.com/en-us/azure/service-bus-messaging/message-sessions

In order to achieve fault tolerance, consumer should use peek-lock (as opposed to receive-and-delete) during file merge and mark message as completed when file merge is completed.为了实现容错,消费者应该在文件合并期间使用 peek-lock(而不是接收和删除),并在文件合并完成时将消息标记为已完成。 When message is marked as completed, consumer may be responsible for removing superfluous files in Blob Storage.当消息被标记为完成时,消费者可能负责删除 Blob 存储中的多余文件。

Possible problems with both existing solution and future solution现有解决方案和未来解决方案可能存在的问题

If customer A starts uploading a huge file #1 and immediately after that starts uploading a small file #2 , file upload of file #2 may be be completed before file #1 and cause an out-of-order situation.如果客户A开始上传大文件#1 ,然后立即开始上传小文件#2 ,则文件# 2的文件上传可能在文件#1之前完成并导致乱序情况。

I assume that this is an issue that is solved in existing solution by using some kind of locking mechanism or file name convention.我认为这是通过使用某种锁定机制或文件名约定在现有解决方案中解决的问题。

Spring-boot with Kafka can solve your problem of fault tolerance. Spring-boot 搭配 Kafka 可以解决您的容错问题。

Kafka supports the producer-consumer model. Kafka 支持生产者-消费者 model。 let the customer events posted to Kafka producer.让客户事件发布到 Kafka 生产者。

configure Kafka with replication for not to lose any events.为 Kafka 配置复制功能,以免丢失任何事件。

use consumers that can invoke the Merging service for each event.使用可以为每个事件调用合并服务的消费者。

  1. once the consumer read the event of customerId and merged then commit the offset.一旦消费者读取了 customerId 的事件并合并然后提交偏移量。

  2. In case of any failure in between merging the event, offset is not committed so it can be read again when the application started again.如果在合并事件之间发生任何故障,则不会提交偏移量,因此当应用程序再次启动时可以再次读取它。

  3. If the merging service can detect the duplicate event with given data then reprocessing the same message should not cause any issue(Kafka promises single delivery of the event).如果合并服务可以检测到具有给定数据的重复事件,那么重新处理相同的消息应该不会导致任何问题(Kafka 承诺事件的单次传递)。 Duplicate event detection is a safety check for an event processed full but failed to commit to Kafka.重复事件检测是对已处理完整但未能提交到 Kafka 的事件的安全检查。

First, event-based approach is corrrect for this scenario.首先,基于事件的方法对于这种情况是正确的。 You should use external broker for pub-sub event messages.您应该为发布-订阅事件消息使用外部代理。

Attention that, by default, Spring publishing an event is synchronous .注意,默认情况下,Spring 发布事件是同步的。

Suppose that, you have 3 services:假设您有 3 个服务:

  1. App Service应用服务
  2. Merge Servcie合并服务
  3. CDC Service (change data capture) CDC 服务(变更数据捕获)
  4. Broker Service (Kafka, RabbitMQ,...)经纪服务 (Kafka, RabbitMQ,...)

Main flow base on "Outbox Pattern":基于“发件箱模式”的主流:

  1. App Service save event message to Outbox message table应用服务将事件消息保存到发件箱消息表
  2. CDC Service watching outbox table and publish event message from Outbox table to Broker Servie CDC Service 监视发件箱表并将事件消息从发件箱表发布到 Broker Servie
  3. Merge Service subcribe to Broker Server and receiving event message (messages is orderly) Merge Service 订阅 Broker Server 并接收事件消息(消息有序)
  4. Merge Servcie perform merge action Merge Servcie 执行合并操作

You can use eventuate lib for this flow.您可以为此流程使用eventuate lib。

Futher more, you can apply DDD to your architecture.此外,您可以将 DDD 应用到您的架构中。 Using Axon framework for CQRS pattern, public domain event and process it.使用 Axon 框架进行 CQRS 模式、公共领域事件并对其进行处理。

Refer to:参考:

  1. Outbox pattern: https://microservices.io/patterns/data/transactional-outbox.html发件箱模式: https://microservices.io/patterns/data/transactional-outbox.html 在此处输入图像描述

It really sounds like you may do with a Stream or an ETL tool for the job.听起来您确实可以使用StreamETL工具来完成这项工作。 When you are developing an app, and you have some prioritisation/queuing/batching requirement, it is easy to see how you can build a solution with a Cron + SQL Database , with maybe a queue to decouple doing work from producing work.当您开发应用程序时,您有一些优先级/排队/批处理要求,很容易看出如何使用Cron + SQL Database构建解决方案,可能有一个队列将工作与生产工作分离。

This may very well be the easiest thing to build as you have a lot of granularity and control to this approach.这很可能是最容易构建的东西,因为您对这种方法有很多粒度和控制。 If you believe that you can in fact meet your requirements this way fairly quickly with low risk, you can do so.如果您认为您实际上可以通过这种方式快速且低风险地满足您的要求,那么您可以这样做。

There are software components which are more tailored to these tasks, but they do have some learning curves, and depend on what PAAS or cloud you may be using.有些软件组件更适合这些任务,但它们确实有一些学习曲线,并且取决于您可能使用的 PAAS 或云。 You'll get monitoring, scalability, availability resiliency out-of-the-box.您将获得开箱即用的监控、可扩展性和可用性弹性。 An open source or cloud service will take the burden of management off your hands.开源或云服务将减轻您的管理负担。

What to use will also depend on what your priority and requirements are.使用什么也取决于您的优先级和要求。 If you want to go the ETL approach which is great at banking up jobs you might want to use something like a Glue t.如果您想使用 go ETL 方法,该方法非常适合存储工作,您可能需要使用 Glue t 之类的东西。 If you want to want prioritization functionality you may want to use multiple queues, it really depends.如果您想要优先级功能,您可能想要使用多个队列,这真的取决于。 You'll also want to monitor with a dashboard to see what wait time you should have for your merge regardless of the approach.您还需要使用仪表板进行监控,以查看无论采用何种方法,您的合并都应等待多长时间。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM