简体   繁体   English

生产者-消费者的速度是否应该随着线程的增加而增加?

[英]Should the speed of producer-consumer increase with more threads?

I'm trying to implement a producer-consumer problem program in C++.我正在尝试在 C++ 中实现生产者-消费者问题程序。 One of the threads fills a queue with vectors of different numbers, while the other threads take the vectors out of the queue (synchronized using conditional variables/unique locks) and perform a simple for loop over the vector, doing some operations over the numbers.其中一个线程用不同数字的向量填充队列,而其他线程将向量从队列中取出(使用条件变量/唯一锁同步)并在向量上执行简单的 for 循环,对数字执行一些操作。 The problem is, the program's speed doesn't seem to be increasing if I use more than 2 threads.问题是,如果我使用超过 2 个线程,程序的速度似乎并没有增加。 Here are some things I have found out working on it:以下是我发现的一些事情:

  • The producer thread is faster than the consumers, meaning the vector will be filled way faster than the consumers are able to process the data生产者线程比消费者线程快,这意味着向量的填充速度将比消费者处理数据的速度快
  • Processing a single vector from the queue takes a very short time, meaning the consumers are constantly asking for data from the queue (I see this as a possible bottleneck due to the synchronization, but am not sure)处理队列中的单个向量需要很短的时间,这意味着消费者不断地从队列中请求数据(由于同步,我认为这可能是一个瓶颈,但不确定)

In such a program, is it expected that more threads would make the program faster, or is the constant speed independent of thread count?在这样的程序中,是否期望更多的线程会使程序更快,或者是恒定的速度与线程数无关? Thanks for any answers or explanations!感谢您的任何答案或解释!

Welcome to Amdahl's law: https://en.wikipedia.org/wiki/Amdahl%27s_law欢迎来到阿姆达尔定律: https://en.wikipedia.org/wiki/Amdahl%27s_law

If your synchronization takes significant time compared to the computation, you cannot expect much speedup because the critical section is effectively single-threaded.如果与计算相比,您的同步需要大量时间,则您不能期望有太多的加速,因为关键部分实际上是单线程的。 Also memory allocation / deallocation is effectively single-threaded in your scenario because your producer allocates from its own memory arena and the consumers need to deallocate the vectors into the same arena.此外 memory 分配/释放在您的场景中实际上是单线程的,因为您的生产者从其自己的 memory 竞技场分配,而消费者需要将向量重新分配到同一个竞技场。

A good way around this is to increase the size of the work items.解决此问题的一个好方法是增加工作项的大小。 Don't take single vectors but multiple ones.不要采用单个向量,而是采用多个向量。 The exact size will need some benchmarking.确切的大小将需要一些基准测试。 A good starting point would be to take ca.一个很好的起点是采取约。 L2 cache size, meaning vectors with an accumulative size of something around 64-256 kiB. L2 缓存大小,意思是累积大小约为 64-256 kiB 的向量。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM