简体繁体 English

生产者-消费者的速度是否应该随着线程的增加而增加？

[英]Should the speed of producer-consumer increase with more threads?

原文 2021-12-04 09:15:55 9 1 c++/ multithreading/ producer-consumer

I'm trying to implement a producer-consumer problem program in C++.我正在尝试在 C++ 中实现生产者-消费者问题程序。 One of the threads fills a queue with vectors of different numbers, while the other threads take the vectors out of the queue (synchronized using conditional variables/unique locks) and perform a simple for loop over the vector, doing some operations over the numbers.其中一个线程用不同数字的向量填充队列，而其他线程将向量从队列中取出（使用条件变量/唯一锁同步）并在向量上执行简单的 for 循环，对数字执行一些操作。 The problem is, the program's speed doesn't seem to be increasing if I use more than 2 threads.问题是，如果我使用超过 2 个线程，程序的速度似乎并没有增加。 Here are some things I have found out working on it:以下是我发现的一些事情：

The producer thread is faster than the consumers, meaning the vector will be filled way faster than the consumers are able to process the data生产者线程比消费者线程快，这意味着向量的填充速度将比消费者处理数据的速度快
Processing a single vector from the queue takes a very short time, meaning the consumers are constantly asking for data from the queue (I see this as a possible bottleneck due to the synchronization, but am not sure)处理队列中的单个向量需要很短的时间，这意味着消费者不断地从队列中请求数据（由于同步，我认为这可能是一个瓶颈，但不确定）

In such a program, is it expected that more threads would make the program faster, or is the constant speed independent of thread count?在这样的程序中，是否期望更多的线程会使程序更快，或者是恒定的速度与线程数无关？ Thanks for any answers or explanations!感谢您的任何答案或解释！

1 个解决方案

Welcome to Amdahl's law: https://en.wikipedia.org/wiki/Amdahl%27s_law欢迎来到阿姆达尔定律： https://en.wikipedia.org/wiki/Amdahl%27s_law

If your synchronization takes significant time compared to the computation, you cannot expect much speedup because the critical section is effectively single-threaded.如果与计算相比，您的同步需要大量时间，则您不能期望有太多的加速，因为关键部分实际上是单线程的。 Also memory allocation / deallocation is effectively single-threaded in your scenario because your producer allocates from its own memory arena and the consumers need to deallocate the vectors into the same arena.此外 memory 分配/释放在您的场景中实际上是单线程的，因为您的生产者从其自己的 memory 竞技场分配，而消费者需要将向量重新分配到同一个竞技场。

A good way around this is to increase the size of the work items.解决此问题的一个好方法是增加工作项的大小。 Don't take single vectors but multiple ones.不要采用单个向量，而是采用多个向量。 The exact size will need some benchmarking.确切的大小将需要一些基准测试。 A good starting point would be to take ca.一个很好的起点是采取约。 L2 cache size, meaning vectors with an accumulative size of something around 64-256 kiB. L2 缓存大小，意思是累积大小约为 64-256 kiB 的向量。