简体   繁体   English

拥有更多OpenMP线程比工作对性能有何影响?

[英]What is the performance impact of having more OpenMP threads than work?

Consider the following example where the individual jobs are independent (no synchronization needed between the threads): 请考虑以下示例,其中各个作业是独立的(线程之间不需要同步):

#pragma omp parallel num_threads(N)
{
    #pragma omp for schedule(dynamic) nowait
    for (int i = 0; i < jobs; ++i)
    {
        ...
    }
}

If N = 4 and jobs = 3 I doubt there will be much of a performance hit to having the extra thread created and destroyed, but if N = 32 then I'm wondering about the impact for creating/destroying the unused threads. 如果N = 4jobs = 3我怀疑创建和销毁额外线程会有很多性能损失,但如果N = 32那么我想知道创建/销毁未使用线程的影响。 Is it something we should even worry about? 这是我们应该担心的吗?

First of all, the most general way to express your code is: 首先,表达代码的最常用方法是:

#pragma omp parallel for schedule(dynamic)
for (int i = 0; i < jobs; ++i)
{
}

Assume that the Implementation has a good default. 假设实现具有良好的默认值。

Before you go any further, measure. 在进一步研究之前,请先测量一下。 Sure sometimes it can be necessary to help out the implementation, but don't do that blindly. 当然有时可能需要帮助实施,但不要盲目地这样做。 Most of the further things are implementation dependent, so looking at the standard doesn't help you a lot. 大多数进一步的东西都是依赖于实现的,所以查看标准对你没什么帮助。

If you still manually specify the number of threads, you might as well give it std::max(N, jobs) . 如果你仍然手动指定线程数,你也可以给它std::max(N, jobs)

Here are some things to look out that could influence the performance in your case: 以下是一些可能会影响您案例表现的事情:

  • Don't worry too much about overhead of spawning unnecessary threads. 不要过分担心产生不必要的线程的开销。 Implementations mitigate that by thread pools. 实现通过线程池来缓解这种情况。 That doesn't mean it's always perfect - so measure. 这并不意味着它总是完美的 - 所以衡量。
  • Do not oversubscribe unless you know what your are doing. 除非您知道自己在做什么,否则不要超额订阅。 Use at most number of cores threads. 使用最多数量的核心线程。 This is a general advice. 这是一般性建议。
  • The OMP_WAIT_POLICY matters in your case as it defines how waiting threads behave. OMP_WAIT_POLICY在您的情况下很重要,因为它定义了等待线程的行为方式。 In your case excess threads will wait at the implicit barrier at the end of the parallel region. 在您的情况下,多余的线程将在并行区域末尾的隐式屏障处等待。 Implementations are free to do what they want with the setting, but you may assume that with active , threads use some form of busy waiting and with passive , threads will sleep. 实现可以通过设置自由地执行他们想要的操作,但是您可以假设在active ,线程使用某种形式的忙等待和passive ,线程将会休眠。 A busy waiting thread could use resources of the computing threads, eg power budget that could use used to increase turbo frequency of the computing threads. 繁忙的等待线程可以使用计算线程的资源,例如可以用于增加计算线程的turbo频率的功率预算。 Also they waste energy. 他们也浪费能源。 In case of oversubscription the impact of active threads is much worse. 在超额预订的情况下,活动线程的影响要严重得多。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM