微服务架构任务系统问题

Question

At this moment i'm in the middle of writing my new application with a microservices architecture.此刻，我正在编写具有微服务架构的新应用程序。 A small brief explanation of what my application will do is as follows:我的应用程序将做什么的简短说明如下：

Microservice A will scrape multiple e-commerce product pages and send all the scraped products 1 by 1 to my next microservice, which I will call B from now on.微服务 A 将抓取多个电子商务产品页面，并将所有抓取的产品一一发送到我的下一个微服务，从现在开始我将其称为 B。 For each product that has no Task with running: true , it scrapes the product and creates a new Task with running: true .对于每个没有带running: true任务的产品，它会抓取产品并创建一个带running: true的新任务。
Microservice B will handle each product (update data in my database) it receives and sends all the changed data compared to the database to my next microservice, which I will call C from now on.微服务 B 将处理它接收到的每个产品（更新我数据库中的数据），并将与数据库相比的所有更改数据发送到我的下一个微服务，从现在开始我将其称为 C。
Microservice C receives a changed product and sends a message to my discord & slack channel.微服务 C 收到更改的产品并向我的 discord & slack 频道发送消息。 When done it will set the running task for this product to running: false完成后，它将将此产品的运行任务设置为running: false

What I'm currently struggling with is that I want microservice A to start the scraping again for the products that have been processed by microservice C. For this I thought of some sort of task system, where each product getting scraped also has a task ID linked to them.我目前正在苦苦挣扎的是，我希望微服务 A 重新开始对微服务 C 处理过的产品进行抓取。为此，我想到了某种任务系统，其中每个抓取的产品也有一个任务 ID与他们联系在一起。 The only problem with this I currently have is that:我目前遇到的唯一问题是：

A task might freeze/fail or whatever.任务可能会冻结/失败或其他什么。 To try to tackle this I have the tasks which are still running (variable in the database) and have started more than 5minutes ago, automatically stopped.为了解决这个问题，我有一些仍在运行的任务（数据库中的变量）并且在 5 分钟前开始自动停止。 This isn't ideal in my head tho, because this means a task could take 5minutes to complete.这在我看来并不理想，因为这意味着一项任务可能需要 5 分钟才能完成。
Since every product getting scraped is assigned 1 task, I would have to quickly deploy a lot of microservices B to handle all the load correctly.由于每个被抓取的产品都分配了 1 个任务，因此我必须快速部署大量微服务 B 才能正确处理所有负载。

What I would like to ask, is that if somebody has a method or tip on how to improve/implement such a system in my microservices.我想问的是，如果有人有关于如何在我的微服务中改进/实现这样一个系统的方法或技巧。 Each product needs to be scraped right after the previous one has been finished.每个产品都需要在前一个产品完成后立即刮掉。 Currently microservice A just checks if it can find a running task for the product, with a setInterval.目前微服务 A 只是检查它是否可以使用 setInterval 找到产品的正在运行的任务。

All of this is developed in NodeJS & all of the information is saved in a MongoDB database.所有这些都是在 NodeJS 中开发的，所有信息都保存在 MongoDB 数据库中。 The communication between the microservices is done through a rabbitMQ.微服务之间的通信是通过rabbitMQ完成的。

Any help is very much appreciated.很感谢任何形式的帮助。

Answer 1

I would like to add two points to this architecture.我想为这个架构添加两点。 It seems that every microservice changes the state of data with respect to time but the data source is same.似乎每个微服务都会随着时间改变数据的状态，但数据源是相同的。

1. Why not change the data status at every microservice [state]? 1. 为什么不在每个微服务[state]处改变数据状态？

For now you are using a boolean value for one job you started running:true .现在，您正在为您开始running:true一项作业使用布尔值running:true 。 We can change it to something like ['scrapping', 'compare', 'notify']我们可以将其更改为['scrapping', 'compare', 'notify']

{
    ...
    status : 'scrapping',
    jobId : 23,
    ...
}

Now when the data is at last microservice C, it can publish a new job with status of 'notify' for consumer microservice A, A can conditionally handle this scenario and rescrap if required.现在，当数据最后是微服务 C 时，它可以为消费者微服务 A 发布一个状态为“通知”的新作业，A 可以有条件地处理这种情况，并在需要时进行回收。 Other benefit is that every microservice can conditionally identify a job of basis of job status as well.另一个好处是每个微服务也可以有条件地识别基于作业状态的作业。 Hence in any cases of failure or restart, every microservice will only perform a task if it fits to its criteria.因此，在任何失败或重启的情况下，每个微服务只会执行符合其标准的任务。 For example microservice B won't start a job which doesn't have scrapping as a status.例如，微服务 B 不会启动一个没有scrapping状态的工作。 Basically, acknowledge your job only if completed using channel.ack(message) .基本上，只有在使用channel.ack(message)完成后才确认您的工作。

2.Data synchronization 2.数据同步

I will not recommend creating multiple B microservices as consumers, there might be an issue in data synchronization [while multiple consumer B, work on same page with different products] Either, you can measure your list of products per page basis adjust your queue configuration accordingly with some testing (but not too long queues as that will deteriorate speed and affect performance or bundle them as one job and send it for processing.我不建议创建多个 B 微服务作为消费者，数据同步可能存在问题[而多个消费者 B，在同一页面上使用不同的产品] 或者，您可以测量每页的产品列表，相应地调整您的队列配置进行一些测试（但不要排队太长，因为这会降低速度并影响性能，或者将它们捆绑为一项作业并将其发送进行处理。

Explore more on :探索更多：

Common rabbitMQ issues 常见的rabbitMQ问题
Calculating queue size 计算队列大小

微服务架构任务系统问题

问题描述

1 个解决方案

解决方案1
0 已采纳 2020-08-27 10:53:32

微服务架构任务系统问题

问题描述

1 个解决方案

解决方案1 0 已采纳 2020-08-27 10:53:32

解决方案1
0 已采纳 2020-08-27 10:53:32