简体   繁体   English

我应该在openMP并行区域内使用gnu并行模式函数(for-loop,tasks)

[英]Should I use gnu parallel mode function inside openMP parallel region(for-loop, tasks)

I have a program accelerated by openMP , inside the parallel region, functions like std::nth_element , std::sort , std::partition are called. 我有一个由openMP加速的程序,在并行区域内,调用std::nth_elementstd::sortstd::partition等函数。 actually, these functions are used to process each openmp-thread's corresponding part of an array. 实际上,这些函数用于处理每个openmp-thread对应的数组部分。

recently, I found g++ had implemented parallel version of above functions, So I wonder should I use function like __gnu_parallel::nth_element inside #pragma omp task or #pragma omp for region? 最近,我发现g ++已经实现了上述函数的并行版本,所以我想我应该在#pragma omp task#pragma omp for使用__gnu_parallel::nth_element等函数#pragma omp for区域吗? if I used the parallel mode, would the total threads exceed the limit set by omp_set_num_threads() and lead to worse speedup? 如果我使用并行模式,总线程是否会超过omp_set_num_threads()设置的限制并导致omp_set_num_threads()加速?

Trivial (and best) answer: Benchmark and post your findings. 琐碎(和最好)答案:基准测试并发布您的发现。

Less definitive: In my experience, the parallel versions of most algorithms are less efficient than the comparable serial ones, instead relying on multiple parallel processors to compensate in wall time. 不太明确:根据我的经验, 大多数算法的并行版本效率低于可比较的串行版本,而是依靠多个并行处理器来补偿壁挂时间。 Regarding the number of threads, I don't think that OMP will spawn new threads if at the limit. 关于线程数,我不认为OMP会在极限情况下产生新线程。 I do remember that embedded #pragma omp for regions don't actually result in each of the outer threads spawning more "inner threads" without a specific flag (which I don't remember off the top of my head). 我确实记得嵌入式#pragma omp for regions实际上不会导致每个外部线程产生更多的“内部线程”而没有特定的标志(我不记得我的头顶)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM