为生产者-消费者模式中的公平数据处理寻找理想的 Azure 服务

Question

I am trying to manage the situation illustrated on the following picture.我正在尝试管理下图所示的情况。

We have several clients = producers (also 1 client = 1 to N producers).我们有几个客户 = 生产者（还有 1 个客户 = 1 到 N 个生产者）。 We need to process the data from each producer fairly (= separately would be the best option), because it can happen that one producer will send us an enormous amount of data and if there were just one queue for all producers, the rest of producers would be blocked.我们需要公平地处理来自每个生产者的数据（= 分开将是最好的选择），因为可能发生一个生产者会向我们发送大量数据，如果所有生产者只有一个队列，其余的生产者会被阻止。 So we need to ensure that no producers will be blocked and served by consumers fairly.所以我们需要确保没有生产者会被消费者公平地阻塞和服务。

Ideal schema would be that each producer would have own queue and own consumer (asynchronously waiting for any message which will be sent to the queue).理想的模式是每个生产者都有自己的队列和自己的消费者（异步等待将发送到队列的任何消息）。 This schema is provided by the picture above.该架构由上图提供。

The problem is that number of producers will growth dynamically with number of clients so we need to create queues and consumers dynamically as well.问题是生产者的数量会随着客户端的数量而动态增长，所以我们也需要动态地创建队列和消费者。

Moreover, we need to manage another difficulties with data from producers.此外，我们需要处理来自生产者的数据的另一个困难。 We need to ensure that older data from Producer A will be processed before the newer one from the same Producer A. (Note: Producers are independent of each other so newer data from Producer B can be processed before older data from Producer A.)我们需要确保来自生产者 A 的旧数据将在来自同一生产者 A 的新数据之前得到处理。（注意：生产者彼此独立，因此来自生产者 B 的新数据可以在来自生产者 A 的旧数据之前处理。）

According to my research it would not be any problem to use Azure Queue storage as Queue (it is not a problem to create queues dynamically via queue.CreateIfNotExists(); ).根据我的研究，将 Azure 队列存储用作队列不会有任何问题（通过queue.CreateIfNotExists();动态创建队列不是问题）。 BUT I do not know what to use as proper consumer.但我不知道使用什么作为合适的消费者。 I know there is a lot of Azure services, for example: Azure function, Azure WebJob, Azure Event hub… etc.我知道有很多 Azure 服务，例如：Azure 函数、Azure WebJob、Azure 事件中心……等等。

My question is: What is the best option for this use-case to use as consumers?我的问题是：这个用例作为消费者使用的最佳选择是什么？

We need to serve the queues as fair as possible that no producer's queue will not be blocked by others.我们需要尽可能公平地为队列服务，没有生产者的队列不会被其他人阻塞。

Thanks in advance for any tips!提前感谢您的任何提示！

UPDATE更新

I was thinking about the use case once again and it results with new schema, see the picture below:我再次考虑用例，结果产生了新的模式，见下图：

The biggest difference from the previous schema is that there is no 1:1:1 relation between Producer, Queue and Consumer.与之前的schema最大的不同是Producer、Queue、Consumer之间没有1:1:1的关系。 Each producer does not need own queue and consumer.每个生产者不需要自己的队列和消费者。

There will be just one "master" queue, where producers will send meta messages ("I have sent batch XY into the Table storage A").只有一个“主”队列，生产者将在其中发送元消息（“我已将批次 XY 发送到表存储 A”）。 There will be also WebJob triggered by the queue and its main task will be sending the information into the "Service bus / event grid / event hub" ( I am just not sure which one would be the best option ).队列还会触发 WebJob，其主要任务是将信息发送到“服务总线/事件网格/事件中心”（我只是不确定哪一个是最佳选择）。

Service bus / event grid / event hub would trigger the Azure function which would do "consumer" stuff there.服务总线/事件网格/事件中心将触发 Azure 函数，该函数将在那里执行“消费者”操作。 It would grab data from Table storage, do some transform and insert it into another structure.它会从表存储中获取数据，进行一些转换并将其插入到另一个结构中。

WebJob will also prevent the situation that two micro-batches from the same producer will be processed at the same time. WebJob 还将防止来自同一生产者的两个微批次同时处理的情况。 WebJob will postpone the another batches till the last batch will be processed. WebJob 将推迟其他批次，直到处理最后一批。

Actually instead of service bus / event grid / event hub could be just WebJob which would have some thread pool (consumers) and it would wake up consumer for each producer.实际上，不是服务总线/事件网格/事件中心可能只是 WebJob，它会有一些线程池（消费者），它会为每个生产者唤醒消费者。 Nevertheless I don't think it is the best option for scaling number of customers because resources of the WebJob are not unlimited.尽管如此，我认为这不是扩展客户数量的最佳选择，因为 WebJob 的资源不是无限的。

The best option would be one of above-mentioned structures (service bus / event hub / event grid).最好的选择是上述结构之一（服务总线/事件中心/事件网格）。 For example each producer would have own topic in service bus and each topic would trigger own Azure function (which would be consumer).例如，每个生产者在服务总线中都有自己的主题，每个主题都会触发自己的 Azure 函数（即消费者）。

I am wondering if it is a correct approach?我想知道这是否是正确的方法？

Answer 1

It's an interesting use case, but in my opinion, Functions will be your best choice for the consumer.这是一个有趣的用例，但在我看来，Functions 将是您对消费者的最佳选择。 I would encapsulate the business logic in a separate file, so all the individual function has is the queue binding and the call to that separate file:我会将业务逻辑封装在一个单独的文件中，因此所有单个函数都具有队列绑定和对该单独文件的调用：

File 1:
public static Task DoStuff(string myQueueItem, ILogger log)
{
    //tranforms here

}

File 2 - n:

[FunctionName("ConsumerA")]
public static void QueueTrigger(
    [QueueTrigger("myqueue-items")] string myQueueItem,
    ILogger log)
{
    await DoStuff(myQueueItem, log);

}

Along with queue creation/deletion, you can use the management API to create and delete functions.除了队列创建/删除之外，您还可以使用管理 API来创建和删除函数。 Each time you create one you adjust the name and queue name so it binds to the new queue.每次创建时，您都会调整名称和队列名称，使其绑定到新队列。

You can control parallel processing somewhat by changing the "maxOutstandingRequests" property in the host.json file.您可以通过更改host.json文件中的“maxOutstandingRequests”属性在一定程度上控制并行处理。 On the consumption plan the max requests will be per instance of the app- if you have the setting set to one and the host scales out to three instances then three messages will be processed at a time.在消费计划中，最大请求数将是应用程序的每个实例 - 如果您将设置设置为 1 并且主机扩展到三个实例，则将一次处理三条消息。 Scaling is a little different if you are using a standard App Service plan, but it can still be done dynamically.如果您使用标准应用服务计划，则缩放会略有不同，但它仍然可以动态完成。 That dynamic scaling of consumer instances is likely going to be helpful with inconsistent data volumes- the large volume data dumps will be processed more quickly but you won't have to have resources sitting around when things are quiet.消费者实例的动态扩展可能有助于处理不一致的数据量 - 大容量数据转储的处理速度会更快，但您不必在事情安静时闲置资源。

When the host scales out, each instance includes a copy of every function, so increasing throughput due to a large data dump will actually increase the available throughput of your other queues as well instead of decreasing it.当主机向外扩展时，每个实例都包含每个函数的副本，因此由于大量数据转储而增加吞吐量实际上也会增加其他队列的可用吞吐量，而不是减少它。

为生产者-消费者模式中的公平数据处理寻找理想的 Azure 服务

问题描述

1 个解决方案

解决方案1
0 2019-07-10 21:20:53

为生产者-消费者模式中的公平数据处理寻找理想的 Azure 服务

问题描述

1 个解决方案

解决方案1 0 2019-07-10 21:20:53

解决方案1
0 2019-07-10 21:20:53