简体   繁体   English

如何使用消息队列在 Node JS 中执行长事件处理?

[英]How to perform long event processing in Node JS with a message queue?

I am building an email processing pipeline in Node JS with Google Pub/Sub as a message queue.我正在使用 Google Pub/Sub 作为消息队列在 Node JS 中构建 email 处理管道。 The message queue has a limitation where it needs an acknowledgment for a sent message within 10 minutes.消息队列有一个限制,它需要在 10 分钟内确认已发送的消息。 However, the jobs it's sending to the Node JS server might take an hour to complete.但是,它发送到 Node JS 服务器的作业可能需要一个小时才能完成。 So the same job might run multiple times till one of them finishes.因此,同一个作业可能会运行多次,直到其中一个完成。 I'm worried that this will block the Node JS event loop and slow down the server too.我担心这会阻塞 Node JS 事件循环并降低服务器速度。

Find an architecture diagram attached.查找随附的架构图。 My questions are:我的问题是:

  1. Should I be using a message queue to start this long-running job given that the message queue expects a response in 10 mins or is there some other architecture I should consider?鉴于消息队列期望在 10 分钟内得到响应,我应该使用消息队列来启动这个长时间运行的工作,还是我应该考虑其他一些架构?
  2. If multiple such jobs start, should I be worried about the Node JS event loop being blocked.如果启动多个此类作业,我是否应该担心 Node JS 事件循环被阻塞。 Each job is basically iterating through a MongoDB cursor creating hundreds of thousands of emails.每个作业基本上都是通过 MongoDB cursor 迭代创建数十万封电子邮件。

架构图

Well, it sounds like you either should not be using that queue (with the timeout you can't change) or you should break up your jobs into something that easily finishes long before the timeouts.好吧,听起来您要么不应该使用该队列(超时无法更改),要么您应该将工作分解为在超时之前很长时间很容易完成的工作。 It sounds like a case of you just need to match the tool with the requirements of the job.听起来您只需要将工具与工作要求相匹配。 If that queue doesn't match your requirements, you probably need a different mechanism.如果该队列不符合您的要求,您可能需要不同的机制。 I don't fully understand what you need from Google's pub/sub, but creating a queue of your own or finding a generic queue on NPM is generally fairly easy if you just want to serialize access to a bunch of jobs.我不完全了解您从 Google 的 pub/sub 中需要什么,但是如果您只想序列化对一堆作业的访问,那么创建自己的队列或在 NPM 上找到通用队列通常相当容易。

I rather doubt you have nodejs event loop blockage issues as long as all your I/O is using asynchronous methods.只要您的所有 I/O 都使用异步方法,我宁愿怀疑您是否存在 nodejs 事件循环阻塞问题。 Nothing you're doing sounds CPU-heavy and that's what blocks the event loop (long running CPU-heavy operations).您所做的一切听起来都占用大量 CPU,这就是阻塞事件循环(长时间运行占用大量 CPU 的操作)的原因。 Your whole project is probably limited by both MongoDB and whatever you're using to send the emails so you should probably make sure you're not overwhelming either one of those to the point where they become sluggish and lose throughput.您的整个项目可能受到 MongoDB 和您用来发送电子邮件的任何东西的限制,因此您可能应该确保您没有压倒其中任何一个,以至于它们变得迟缓并失去吞吐量。

To answer the original question:要回答原始问题:

  1. Should I be using a message queue to start this long-running job given that the message queue expects a response in 10 mins or is there some other architecture I should consider?鉴于消息队列期望在 10 分钟内得到响应,我应该使用消息队列来启动这个长时间运行的工作,还是我应该考虑其他一些架构?

Yes, a message queue works well for dealing with these kinds of events.是的,消息队列非常适合处理这些类型的事件。 The important thing is to make sure the final action is idempotent, so that even if you process duplicate events by accident, the final result is applied once.重要的是要确保最终的动作是幂等的,这样即使你不小心处理了重复的事件,最终的结果也会被应用一次。 This guide from Google Cloud is a helpful resource on making your subscriber idempotent.来自 Google Cloud 的这份指南是使您的订阅者具有幂等性的有用资源。

To get around the 10 min limit of Pub/Sub, I ended up creating an in-memory table that tracked active jobs.为了绕过 Pub/Sub 的 10 分钟限制,我最终创建了一个内存表来跟踪活动作业。 If a job was actively being processed and Pub/Sub sent the message again, it would do nothing.如果正在积极处理作业并且 Pub/Sub 再次发送消息,它将什么都不做。 If the server restarts and loses the job, the in-memory table also disappears, so the job can be processed once again if it was incomplete.如果服务器重新启动并丢失作业,内存中的表也会消失,因此如果作业不完整,可以再次处理。

  1. If multiple such jobs start, should I be worried about the Node JS event loop being blocked.如果启动多个此类作业,我是否应该担心 Node JS 事件循环被阻塞。 Each job is basically iterating through a MongoDB cursor creating hundreds of thousands of emails.每个作业基本上都是通过 MongoDB cursor 迭代创建数十万封电子邮件。 I have ignored this for now as per the comment left by jfriend00.根据 jfriend00 留下的评论,我暂时忽略了这一点。 You can also rate-limit the number of jobs being processed.您还可以对正在处理的作业数量进行速率限制。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM