[英]What is the performance impact of having more OpenMP threads than work?
Consider the following example where the individual jobs are independent (no synchronization needed between the threads): 请考虑以下示例,其中各个作业是独立的(线程之间不需要同步):
#pragma omp parallel num_threads(N)
{
#pragma omp for schedule(dynamic) nowait
for (int i = 0; i < jobs; ++i)
{
...
}
}
If N = 4
and jobs = 3
I doubt there will be much of a performance hit to having the extra thread created and destroyed, but if N = 32
then I'm wondering about the impact for creating/destroying the unused threads. 如果
N = 4
且jobs = 3
我怀疑创建和销毁额外线程会有很多性能损失,但如果N = 32
那么我想知道创建/销毁未使用线程的影响。 Is it something we should even worry about? 这是我们应该担心的吗?
First of all, the most general way to express your code is: 首先,表达代码的最常用方法是:
#pragma omp parallel for schedule(dynamic)
for (int i = 0; i < jobs; ++i)
{
}
Assume that the Implementation has a good default. 假设实现具有良好的默认值。
Before you go any further, measure. 在进一步研究之前,请先测量一下。 Sure sometimes it can be necessary to help out the implementation, but don't do that blindly.
当然有时可能需要帮助实施,但不要盲目地这样做。 Most of the further things are implementation dependent, so looking at the standard doesn't help you a lot.
大多数进一步的东西都是依赖于实现的,所以查看标准对你没什么帮助。
If you still manually specify the number of threads, you might as well give it std::max(N, jobs)
. 如果你仍然手动指定线程数,你也可以给它
std::max(N, jobs)
。
Here are some things to look out that could influence the performance in your case: 以下是一些可能会影响您案例表现的事情:
OMP_WAIT_POLICY
matters in your case as it defines how waiting threads behave. OMP_WAIT_POLICY
在您的情况下很重要,因为它定义了等待线程的行为方式。 In your case excess threads will wait at the implicit barrier at the end of the parallel region. active
, threads use some form of busy waiting and with passive
, threads will sleep. active
,线程使用某种形式的忙等待和passive
,线程将会休眠。 A busy waiting thread could use resources of the computing threads, eg power budget that could use used to increase turbo frequency of the computing threads.
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.