简体   繁体   中英

C++ OpenMP directives for a parallel for loop?

I am trying OpenMP on a particular code snippet. Not sure if the snippet needs a revamp, perhaps it is set up too rigidly for sequential implementation. Anyway here is the (pseudo-)code that I'm trying to parallelize:

#pragma omp parallel for private(id, local_info, current_local_cell_id, local_subdomain_size) shared(cells, current_global_cell_id, global_id)
for(id = 0; id < grid_size; ++id) {
   local_info = cells.get_local_subdomain_info(id);
   local_subdomain_size = local_info.size();
   ...do other stuff...
   do {
      current_local_cell_id = cells.get_subdomain_cell_id(id);
      global_id.set(id, current_global_cell_id + current_local_cell_id);
   } while(id < local_subdomain_size && ++id);
   current_global_cell_id += local_subdomain_size;
}

This makes complete sense (after staring at it for some time) in a sequential sense, which also might mean that it needs to be re-written for OpenMP. My concern is that current_local_cell_id and local_subdomain_size are private, but current_global_cell_id and global_id are shared.

Hence the statement current_global_cell_id += local_subdomain_size after the inner loop:

do {
  ...
} while(...)
current_global_cell_id += local_subdomain_size;

might lead to errors in the OpenMP setting, I suspect. I would greatly appreciate if any of the OpenMP experts out there can provide some pointers on any of the special OMP directives I can use to make minimum changes to the code but still avail of OpenMP for such a type of for loop.

I'm not sure I understand your code. However, I think you really want some kind of parallel accumulation.

You could use a pattern like

 size_t total = 0;
 #pragma omp parallel for shared(total) reduction (+:total)
 for (int i=0; i<MAXITEMS; i++)
 {
      total += getvalue(i); // TODO replace with your logic
 }

 // total has been 'magically' combined by OMP

On a related note, when you use gcc you can just use the __gnu_parallel::accumulate drop-in replacement for std::accumulate , which does exactly the same. See Chapter 18. Parallel Mode

 size_t total = __gnu_parallel::accumulate(c.begin(), c.end(), 0, &myvalue_accum);

You can even compile with -D_GLIBCXX_PARALLEL which will make all use of std algorithms automatically parallellized if possible. Don't use that unless you know what you're doing! Frequently, performance just suffers and the chance of introducing bugs due to unexpected parallelism is real

changing id inside the loop is not correct. There is no way to dispatch the loop to different thread, as loop step does not produce a predictable id value.

Why are you using the id inside that do while loop?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM