简体   繁体   中英

OpenMP for loop on array of C++ objects

I am running a simulation in which many random numbers are generated. The RNG is implemented as a C++ object which has a public method returning the random number. In order to use it with OpenMP parallelization, I simply create an array of such RNG objects, one for every thread. Each thread then generates its own random numbers by calling one of the RNGs. Eg:

  for (int i = 0; i < iTotThreads; i++) {
    aRNG[i] = new RNG();
  }
  // ... stuff here
#pragma omp parallel 
  {
    iT = omp_get_thread_num();
#pragma omp for
    for ( /* big loop */) {
      // more stuff
      aRNG[iT]->getRandomNumber();
      // more stuff
    }
  }  

Even though each RNG works on its own member variables and two such RNGs do not fit within a single cache line (I also tried explicitly aligning each of them at creation), there seems to be some false sharing going on as the code does not scale at all.

If I instantiate the objects within an omp parallel region:

#pragma omp parallel
  { 
    i = omp_get_thread_num();
    aRNG[i] = new RNG();
  }

the code scales perfectly. Do you have any idea of what I am missing here?

EDIT: by the way, in the second case (the one that scales well), the parallel region in which I create the RNGs is not the same as the one in which I use them. I'm counting on the fact that when I enter the second parallel region every pointer in aRNG[] will still point to one of my objects, but I guess this is bad practice...

Although I doubt from your description that false sharing is the cause of your problem, why don't you simplify the code in this way:

  // ... stuff here
#pragma omp parallel 
  {
    RNG rng;
#pragma omp for
    for ( /* big loop */) {
      // more stuff
      rng.getRandomNumber();
      // more stuff
    }
  }

Being declared inside a parallel region rng will be a private variable with automatic storage duration, so:

  • each thread will have its own private random number generator (no false sharing possible here)
  • you don't have to manage allocation/deallocation of a resource

In case this approach is unfeasible, and following the suggestion of @HristoIliev, you can always declare a threadprivate variable to hold the pointer to the random number generator:

static std::shared_pointer<RNG> rng;
#pragma omp threadprivate(rng);

and allocate it in the first parallel region:

rng.reset( new RNG );

In this case though there are a few caveats to ensure that the value of rng will be preserved across parallel regions (quoting from the OpenMP 4.0 standard):

The values of data in the threadprivate variables of non-initial threads are guaranteed to persist between two consecutive active parallel regions only if all the following conditions hold:

  • Neither parallel region is nested inside another explicit parallel region.
  • The number of threads used to execute both parallel regions is the same.
  • The thread affinity policies used to execute both parallel regions are the same.
  • The value of the dyn-var internal control variable in the enclosing task region is false at entry to both parallel regions.

If these conditions all hold, and if a threadprivate variable is referenced in both regions, then threads with the same thread number in their respective regions will reference the same copy of that variable.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM