简体   繁体   中英

OpenMP nesting not turning off

I'm trying to manage nested parallel regions with OpenMP (4.5, via GCC 7.2.0) and I'm having some issues turning off nesting.

Sample program:

#include <stdio.h>
#include <omp.h>

void foobar() {
  int tid = omp_get_thread_num();
  #pragma omp parallel for
  for (int i = 0; i < 4; i++) {
    int otid = omp_get_thread_num();
    printf("%d | %d\n", tid, otid);
  }
}

int main(void) {
  omp_set_nested(0);
  #pragma omp parallel
  {
    foobar();
  }
  printf("\n");
  foobar();
  return 0;
}

What I'm expecting to happen here is both the parallel region and non-parallel call on foobar() will spit out 4 lines, something to the tune of

// parallel region foobar()
0 | 0
1 | 1
2 | 2
3 | 3
// serial region foobar()
0 | 0
0 | 1
0 | 2
0 | 3

As I am not allowing nested parallelism. However, I get 16 lines within the parallel region with the correct TID, but the OTID is always 0 (ie every thread is spawning 4 of its own, and executing the entire loop on that) and I get 4 lines outside (ie the parallel for is spawning 4 threads as I would expect)

I feel like I'm missing something very obvious here, can anybody shed some light for me? Isn't disabling nesting supposed to turn that omp parallel for into a regular omp for, and distribute the work accordingly?

Your issue comes from the false assumption that the omp for directive will be interpreted and the corresponding work distributed among the threads irrespective of which parallel region is active. Unfortunately, in your code, the omp for is only associated with the parallel region that is declared in function foobar() . Therefore, when this region is activated (meaning since you disabled the nested parallelism, when foobar() isn't called from another parallel region) your loop will be distributed among the newly spawn threads. But when it isn't, because foobar() is called from another parallel region, then the omp for is ignored and the loop isn't distributed among the calling threads. So each and every one of them executes the whole loop, leading to the replication of printf() that you see.

A possible solution would be something like this:

#include <stdio.h>
#include <omp.h>

void bar(int tid) {
  #pragma omp for
  for (int i = 0; i < 4; i++) {
    int otid = omp_get_thread_num();
    printf("%d | %d\n", tid, otid);
  }
}

void foobar() {
  int tid = omp_get_thread_num();
  int in_parallel = omp_in_parallel();
  if (!in_parallel) {
    #pragma omp parallel
    bar(tid);
  }
  else {
    bar(tid);
  }
}

int main() {
  #pragma omp parallel
  foobar();
  printf("\n");
  foobar();
  return 0;
}

I don't really find this solution entirely satisfying, but I don't see any better one right now. Maybe later will I get some enlightenment...

EDIT: well I had another idea: doing it the other way around and forcing the nested parallelism, with only one single active thread whenever the function was called from an actual parallel region:

#include <stdio.h>
#include <omp.h>

void foobar() {
  int tid = omp_get_thread_num();
  omp_set_nested(1);
  #pragma omp single
  #pragma omp parallel for
  for (int i = 0; i < 4; i++) {
    int otid = omp_get_thread_num();
    printf("%d | %d\n", tid, otid);
  }
}

int main() {
  #pragma omp parallel
  foobar();
  printf("\n");
  foobar();
  return 0;
}

And this time the code looks much nicer without any duplication, and gives (for example):

$ OMP_NUM_THREADS=4 ./nested
3 | 2
3 | 3
3 | 1
3 | 0

0 | 3
0 | 1
0 | 0
0 | 2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM