简体   繁体   中英

openMP multithreading c++

From one of my previous post efficient approach in using c++ thread/ boost thread

I know that I could convert a serial program into parallelized one by using openMP and changing:

int thread_count=8;
for(int i=1;i<100000;i++)
{
do_work(i,record);
}

into

int thread_count=8;
#pragma omp parallel for
for(int i=1;i<100000;i++)
{
do_work(i,record);
}

How about fully parallelizing a nested for loop? Is it by changing

int thread_count=8;
for(int i=1;i<100000;i++)
{
   for(int j=1;j<100000;j++){
do_work(i,j,record);
 }
}

into

int thread_count=8;
#pragma omp parallel for
for(int i=1;i<100000;i++)
{
   #pragma omp parallel for
   for(int j=1;j<100000;j++){
do_work(i,j,record);
 }
}

For maximal parallelization? Thank you.

This isn't usually a good idea to do that. This implies nested parallelism with creation (more likely only management) of thread pools at each iteration of the outermost loop.

However, if really parallelising only the outermost loop isn't sufficient for you (which it should be in most cases) you can always consider using a collapse(2) clause to fuse the i and j loops and having the whole (i,j) domain dealt with in parallel.

Last solution for special needs if you really need nested parallelism without the OpenMP parallel overhead is to create a single parallel region and to manually assign work to your threads based of their id. This isn't as simple as just putting compiler directives, but this isn't particularly complicated either... Still, you should only consider that if/when you have very specific needs that you cannot address in a satisfactory manner with usual OpenMP constructs / philosophy.

First of all, to use a specific thread count you should use:

int thread_count=8;
#pragma omp parallel for num_threads(thread_count)
for(int i=1;i<100000;i++)
{
do_work(i,record);
}

And if you want nested, you need to turn it on with omp_set_nested(1) .

If each thread is doing a similar job, in order to achieve maximum performance in parallelization, you should make sure that the total number of threads corresponds to the number of cores / virtual processors (in case of hyper-threading), so use omp_get_max_threads() to check it. And if you use nested parallelization, the number of threads is the product of thread number on each level - so you easily can produce more threads than your virtual processors can effectively support.

The way you suggested will not give you performance increase, since every thread will be still executing single do_work(...) . However, if single do_work() is long enough, and itself contains some loops, you might get some speed boost, if you apply the second level of paralell processing inside of it. In this way, your threads run tasks of different length, and the scheduler may squeeze in some short tasks if there are available resources at a given moment.

But for this I would not recommend nested OMP - in my experiments, applying the second level of #pragma omp for actually degraded the speed. Yet, you might still get some improvement if you use different mechanisms of multithreading, for example: use OMP for external parallelization and boost thread pool or WinApi _beginthreadex(...) for inner loops.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM