[英]Should I use gnu parallel mode function inside openMP parallel region(for-loop, tasks)
I have a program accelerated by openMP
, inside the parallel region, functions like std::nth_element
, std::sort
, std::partition
are called. 我有一个由
openMP
加速的程序,在并行区域内,调用std::nth_element
, std::sort
, std::partition
等函数。 actually, these functions are used to process each openmp-thread's corresponding part of an array. 实际上,这些函数用于处理每个openmp-thread对应的数组部分。
recently, I found g++ had implemented parallel version of above functions, So I wonder should I use function like __gnu_parallel::nth_element
inside #pragma omp task
or #pragma omp for
region? 最近,我发现g ++已经实现了上述函数的并行版本,所以我想我应该在
#pragma omp task
或#pragma omp for
使用__gnu_parallel::nth_element
等函数#pragma omp for
区域吗? if I used the parallel mode, would the total threads exceed the limit set by omp_set_num_threads()
and lead to worse speedup? 如果我使用并行模式,总线程是否会超过
omp_set_num_threads()
设置的限制并导致omp_set_num_threads()
加速?
Trivial (and best) answer: Benchmark and post your findings. 琐碎(和最好)答案:基准测试并发布您的发现。
Less definitive: In my experience, the parallel versions of most algorithms are less efficient than the comparable serial ones, instead relying on multiple parallel processors to compensate in wall time. 不太明确:根据我的经验, 大多数算法的并行版本效率低于可比较的串行版本,而是依靠多个并行处理器来补偿壁挂时间。 Regarding the number of threads, I don't think that OMP will spawn new threads if at the limit.
关于线程数,我不认为OMP会在极限情况下产生新线程。 I do remember that embedded
#pragma omp for
regions don't actually result in each of the outer threads spawning more "inner threads" without a specific flag (which I don't remember off the top of my head). 我确实记得嵌入式
#pragma omp for
regions实际上不会导致每个外部线程产生更多的“内部线程”而没有特定的标志(我不记得我的头顶)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.