OpenMP 嵌套未关闭

Question

I'm trying to manage nested parallel regions with OpenMP (4.5, via GCC 7.2.0) and I'm having some issues turning off nesting.我正在尝试使用 OpenMP（4.5，通过 GCC 7.2.0）管理嵌套的并行区域，但在关闭嵌套时遇到了一些问题。

Sample program:示例程序：

#include <stdio.h>
#include <omp.h>

void foobar() {
  int tid = omp_get_thread_num();
  #pragma omp parallel for
  for (int i = 0; i < 4; i++) {
    int otid = omp_get_thread_num();
    printf("%d | %d\n", tid, otid);
  }
}

int main(void) {
  omp_set_nested(0);
  #pragma omp parallel
  {
    foobar();
  }
  printf("\n");
  foobar();
  return 0;
}

What I'm expecting to happen here is both the parallel region and non-parallel call on foobar() will spit out 4 lines, something to the tune of我期望在这里发生的是 foobar() 上的并行区域和非并行调用都将吐出 4 行，这与

// parallel region foobar()
0 | 0
1 | 1
2 | 2
3 | 3
// serial region foobar()
0 | 0
0 | 1
0 | 2
0 | 3

As I am not allowing nested parallelism.因为我不允许嵌套并行。 However, I get 16 lines within the parallel region with the correct TID, but the OTID is always 0 (ie every thread is spawning 4 of its own, and executing the entire loop on that) and I get 4 lines outside (ie the parallel for is spawning 4 threads as I would expect)但是，我在具有正确 TID 的并行区域内得到 16 行，但 OTID 始终为 0（即每个线程都产生 4 个自己的线程，并在其上执行整个循环）并且我在外部得到 4 行（即并行因为正如我所期望的那样产生 4 个线程）

I feel like I'm missing something very obvious here, can anybody shed some light for me?我觉得我在这里遗漏了一些非常明显的东西，有人能帮我解释一下吗？ Isn't disabling nesting supposed to turn that omp parallel for into a regular omp for, and distribute the work accordingly?是否禁用嵌套应该将该 omp 并行转换为常规 omp for，并相应地分配工作？

Answer 1

Your issue comes from the false assumption that the omp for directive will be interpreted and the corresponding work distributed among the threads irrespective of which parallel region is active.您的问题来自错误的假设，即omp for指令将被解释并且相应的工作分布在线程之间，而不管哪个parallel区域处于活动状态。 Unfortunately, in your code, the omp for is only associated with the parallel region that is declared in function foobar() .不幸的是，在您的代码中， omp for仅与函数foobar()声明的parallel区域相关联。 Therefore, when this region is activated (meaning since you disabled the nested parallelism, when foobar() isn't called from another parallel region) your loop will be distributed among the newly spawn threads.因此，当这个区域被激活时（意思是因为你禁用了嵌套的并行性，当foobar()不被另一个parallel区域调用时）你的循环将分布在新生成的线程中。 But when it isn't, because foobar() is called from another parallel region, then the omp for is ignored and the loop isn't distributed among the calling threads.但如果不是，因为foobar()是从另一个parallel区域调用的，那么omp for被忽略并且循环不会在调用线程之间分布。 So each and every one of them executes the whole loop, leading to the replication of printf() that you see.因此，它们中的每一个都执行整个循环，从而导致您看到的printf()的复制。

A possible solution would be something like this:一个可能的解决方案是这样的：

#include <stdio.h>
#include <omp.h>

void bar(int tid) {
  #pragma omp for
  for (int i = 0; i < 4; i++) {
    int otid = omp_get_thread_num();
    printf("%d | %d\n", tid, otid);
  }
}

void foobar() {
  int tid = omp_get_thread_num();
  int in_parallel = omp_in_parallel();
  if (!in_parallel) {
    #pragma omp parallel
    bar(tid);
  }
  else {
    bar(tid);
  }
}

int main() {
  #pragma omp parallel
  foobar();
  printf("\n");
  foobar();
  return 0;
}

I don't really find this solution entirely satisfying, but I don't see any better one right now.我真的不觉得这个解决方案完全令人满意，但我现在看不到任何更好的解决方案。 Maybe later will I get some enlightenment...也许以后我会得到一些启示......

EDIT: well I had another idea: doing it the other way around and forcing the nested parallelism, with only one single active thread whenever the function was called from an actual parallel region:编辑：好吧，我有另一个想法：反过来做并强制嵌套并行性，每当从实际parallel区域调用函数时，只有一个活动线程：

#include <stdio.h>
#include <omp.h>

void foobar() {
  int tid = omp_get_thread_num();
  omp_set_nested(1);
  #pragma omp single
  #pragma omp parallel for
  for (int i = 0; i < 4; i++) {
    int otid = omp_get_thread_num();
    printf("%d | %d\n", tid, otid);
  }
}

int main() {
  #pragma omp parallel
  foobar();
  printf("\n");
  foobar();
  return 0;
}

And this time the code looks much nicer without any duplication, and gives (for example):这一次代码看起来更好，没有任何重复，并给出（例如）：

$ OMP_NUM_THREADS=4 ./nested
3 | 2
3 | 3
3 | 1
3 | 0

0 | 3
0 | 1
0 | 0
0 | 2

OpenMP 嵌套未关闭

问题描述

1 个解决方案

解决方案1
3 已采纳 2020-01-08 07:38:07

OpenMP 嵌套未关闭

问题描述

1 个解决方案

解决方案1 3 已采纳 2020-01-08 07:38:07

解决方案1
3 已采纳 2020-01-08 07:38:07