考虑延迟和效率来分配任务的最佳方式

Question

I'm looking for an algorithm to distribute some tasks. 我正在寻找一种算法来分配一些任务。 The problem is as follows: 问题如下：

Say I have a central task producer and some client consumers. 假设我有一个中央任务生产者和一些客户消费者。 The producer generates tasks and consumers take tasks (for starters, one at a time), process them, and when they are done, take new tasks (I already have a task queue). 生产者生成任务并且消费者接受任务（对于初学者，一次一个），处理它们，并且当它们完成时，接受新任务（我已经有任务队列）。

The thing is, if you consider latency for a task to get from the producer to the consumer, it might make sense to group tasks together. 问题是，如果您考虑从生产者到消费者的任务延迟，将任务组合在一起可能是有意义的。 For example, say we have 10 tasks in total and 2 consumers. 例如，假设我们总共有10个任务和2个消费者。 If each of the tasks take 5 ms to get processed and the network latency is also 5 ms, sending 2 groups of 5 tasks each to each consumer will take 5ms + 5*5ms = 30ms, while sending the tasks individually would take 5*5ms + 5*5ms = 50ms, because the latency overhead appears for every task. 如果每个任务需要5毫秒来处理并且网络延迟也是5毫秒，则每个消费者每组发送2组5个任务将花费5毫秒+ 5 * 5毫秒= 30毫秒，而单独发送任务需要5 * 5毫秒+ 5 * 5ms = 50ms，因为每个任务都会出现延迟开销。

It's not as simple as grouping since some tasks will probably take longer, and it would make sense to send them separate as to let the other tasks that take a shorter time get processed in parallel by the other consumers. 它不像分组那么简单，因为某些任务可能需要更长时间，并且将它们分开发送是有意义的，以便让其他消费者并行处理花费较短时间的其他任务。 I'm planning on doing some statistics regarding the type of tasks. 我打算做一些关于任务类型的统计数据。 The number of consumers is also not constant. 消费者的数量也不是一成不变的。

Any idea of a good algorithm or a good read that can help me in achieving this? 想要一个好的算法或一个好的阅读，可以帮助我实现这个目标吗？

Answer 1

At the moment a producer generates a task, not sending it right away will only increase that task's latency. 生产者生成任务时，不立即发送任务只会增加该任务的延迟。 Therefore I will assume that the task dispatcher works on snapshots of the current task queue: it takes all the tasks in the queue, sends them immediately in all directions, goes back to the queue, again takes all the tasks accumulated in the meantime, lather, rinse, repeat. 因此，我假设任务调度程序在当前任务队列的快照上工作：它接受队列中的所有任务，立即向所有方向发送它们，返回队列，再次获取在此期间累积的所有任务，泡沫，冲洗，重复。

The dispatcher maintains an estimate of the completion time of each consumer. 调度员维持每个消费者的完成时间的估计。 It orders the consumers according to increasing completion time and adds a task to the batch of the consumer with earliest completion time. 它根据增加的完成时间对消费者进行订购，并在最早的完成时间内向消费者的批次添加任务。 Then it adds the average task time to that consumers completion time estimate, thus obtaining new estimate, then reorders the consumers according to the new estimates (in O(log n) using a heap) and goes to the next task. 然后，它将平均任务时间添加到该消费者完成时间估计，从而获得新估计，然后根据新估计重新排序消费者O(log n)使用堆在O(log n) ）并转到下一个任务。 After all the tasks of the current snapshot are processed, send batches to the consumers and go make a new snapshot. 处理完当前快照的所有任务后，将批次发送给使用者并创建新快照。

This policy will achieve equal consumer load on average . 该政策平均将实现平等的消费者负担。 It can be improved: 它可以改进：

if each consumer is able to provide some feedback about the estimated completion time: it's the average task time multiplied by the number of tasks pending in the consumer. 如果每个消费者能够提供关于估计完成时间的一些反馈：它是平均任务时间乘以消费者中待决任务的数量。 It's more precise because the consumer will use the actual time of the completed tasks instead of the average 它更精确，因为消费者将使用已完成任务的实际时间而不是平均值
if the time to process each task is either known or can be estimated per-task, so the dispatcher will use a per-task estimate instead of an average. 如果处理每个任务的时间是已知的或可以按任务估算，那么调度员将使用每任务估计而不是平均值。

EDIT: Forgot to mention: 编辑：忘了提及：

The completion time is estimated as start-time + average-task-time * number-of-tasks-sent-to-a-consumer + latency * number-of-batches-sent-to-a-consumer . 完成时间估计为start-time + average-task-time * number-of-tasks-sent-to-a-consumer + latency * number-of-batches-sent-to-a-consumer 。

Answer 2

To clarify my comment to your question, let's suppose you have the following loop in your consumer: 为了澄清我对您的问题的评论，我们假设您的消费者中有以下循环：

while (keepConsuming) {
    Task t = Task::get();
    t.process();
}

you could rewrite it like this (supposing we can use OpenMP): 你可以像这样重写它（假设我们可以使用OpenMP）：

Task cur=NULL, next;
do {
    #pragma omp sections
    {
        #pragma omp section
        if (cur != NULL) cur.process();
        #pragma omp section
        next = keepConsuming ? Task::get() : NULL;
    }
    cur = next;
} while (cur != NULL);

This way, the process() and get() inside the while are executed in parallel (obviously, assuming those two functions don't share any state). 这样，while内的process（）和get（）是并行执行的（显然，假设这两个函数不共享任何状态）。

Answer 3

Ahhh... the classic decision between fine-grained parallelism (which gives better load balancing but relatively higher overhead for synchronizing) versus coarse-grained parallelism (which obviously gives the opposite). 啊......细粒度并行性（它提供更好的负载平衡但是相对更高的同步开销）与粗粒度并行性（显然相反）之间的经典决策。 Sorry but there's no easy answer... 对不起，但没有简单的答案......

Some thoughts: 一些想法：

Do lots of profiling, that's a good way to find a suitable number of tasks to group together. 做大量的分析，这是找到合适数量的任务组合在一起的好方法。 Just good old trial and error :) 只是好老试验和错误:)
Consider making a local task queue at each client. 考虑在每个客户端创建一个本地任务队列。 This can enable some sort of pre-fetch, eg when task n finishes, request task n+5 and start task n+1. 这可以启用某种预取，例如当任务n完成时，请求任务n + 5并启动任务n + 1。 Not sure if you are using multithreading or if task n+1 will be interrupted to accept task n+5. 不确定您是否正在使用多线程，或者是否会中断任务n + 1以接受任务n + 5。
Try to compact the task representation as much as possible. 尝试尽可能地压缩任务表示。 This may mean using char instead of int (this does make a difference for arrays). 这可能意味着使用char而不是int（这确实对数组产生了影响）。 Maybe some parts of the task can be recalculated when it gets to the consumer. 也许任务的某些部分可以在到达消费者时重新计算。
Consider using some sort of timer on each consumer as feedback to adjust the number of tasks to take as a group next time. 考虑在每个消费者身上使用某种计时器作为反馈，以调整下次作为一组进行的任务数量。 If you spend too much time, then grab fewer tasks next time. 如果你花费太多时间，那么下次就可以减少任务。 Beware a fancy heuristic may have some non-trivial overhead to it. 注意花哨的启发式可能会有一些非平凡的开销。

Answer 4

It seems that the main issue with the simple approach is that the consumer will stall for the amount of time it takes to fetch the next task. 似乎简单方法的主要问题是消费者将停止获取下一个任务所花费的时间。 No useful work gets done during the stall. 在摊位期间没有完成有用的工作。

Since latency -- rather than bandwidth -- is the main problem, one solution is to amortize the stall across multiple tasks, for example by grouping tasks into batches. 由于延迟 - 而不是带宽 - 是主要问题，因此一种解决方案是在多个任务中分摊停顿，例如通过将任务分组到批次中。 To do this well, you need to have a good idea of how long each task will take to process. 要做到这一点，您需要知道每个任务需要多长时间才能处理。

An alternative is to fetch the next task in parallel with processing the current one. 另一种方法是与处理当前任务并行获取下一个任务。 This can be easily done with two threads: thread A processing the current task and thread B fetching the next task. 这可以通过两个线程轻松完成：线程A处理当前任务，线程B获取下一个任务。 When A is done with the current task, the threads can either switch roles or pass the next task from B to A . 当A完成当前任务时，线程可以切换角色或将下一个任务从B传递给A This is a form of pipeline parallelism. 这是一种管道并行性。

Answer 5

If your definition of latency can be augmented to 2 dimensions means that a cosumer can have different latency then you might try a space-filling-curve. 如果您的延迟定义可以扩展到2维，则意味着消费者可以有不同的延迟，那么您可以尝试空间填充曲线。 A sfc subdivide the 2d and reduce the complexity to a 1 dimensiona. sfc细分2d并将复杂度降低到1维度。 So you compute a number from f(x,y). 所以你从f（x，y）计算一个数字。 Then you can sort this number and send the number in this order to the consumers. 然后，您可以对此数字进行排序，并将此订单中的数字发送给消费者。 Of course you must write a SFC before you can use it I won't do it for you but I can help you if you have problems. 当然你必须先写一个SFC才能使用它我不会为你做，但如果你有问题我可以帮助你。

考虑延迟和效率来分配任务的最佳方式

问题描述

5 个解决方案

解决方案1
1 已采纳 2011-11-09 15:38:10

解决方案2
1 2011-11-09 16:03:50

解决方案3
1 2011-11-10 04:15:17

解决方案4
0 2011-11-09 13:46:53

解决方案5
-1 2011-11-09 14:09:26

考虑延迟和效率来分配任务的最佳方式

问题描述

5 个解决方案

解决方案1 1 已采纳 2011-11-09 15:38:10

解决方案2 1 2011-11-09 16:03:50

解决方案3 1 2011-11-10 04:15:17

解决方案4 0 2011-11-09 13:46:53

解决方案5 -1 2011-11-09 14:09:26

解决方案1
1 已采纳 2011-11-09 15:38:10

解决方案2
1 2011-11-09 16:03:50

解决方案3
1 2011-11-10 04:15:17

解决方案4
0 2011-11-09 13:46:53

解决方案5
-1 2011-11-09 14:09:26