简体   繁体   中英

Differences between Shared and Private in OpenMP (C++)

I am trying to parallelize my C++ code using OpenMP.

So this is my first time with OpenMP and I have a couple of questions about how to use private / shared properly

Below is just a sample code I wrote to understand what is going on. Correct me if I am wrong.

#pragma omp parallel for
for (int x=0;x<100;x++)
{
    for (int y=0;y<100;y++)
    {
        for (int z=0;z<100;z++)
        {
             a[x][y][z]=U[x]+U[y]+U[z]; 
        }
    }
}

So by using #pragma omp parallel for I can use multiple threads to do this loop ie with 5 threads, #1 thread use 0<=x<20, #2 thread use 20<=x<40... 80 <=x<100.

And each thread runs at the same time. So by using this, I can make this code faster.

Since x , y , and z are declared inside the loop, they are private (each thread will have a copy version of these variables), a and U are shared.

So each thread reads a shared variable U and writes to a shared variable a .

I have a couple of questions.

  1. What would be the difference between #pragma omp parallel for and #pragma omp parallel for private(y,z) ? I think since x , y , and z are already private, they should be the same.

  2. If I use #pragma omp parallel for private(a, U) , does this mean each thread will have a copy of a and U ?

For example, with 2 threads that have a copy of a and U , thread #1 use 0<=x<50 so that it writes from a[0][0][0] to a[49][99][99] and thread #2 writes from a[50][0][0] to a[99][99][99] . And after that they merge these two results so that they have complete version of a[x][y][z] ?

Any variable declared within a parallel block will be private. Variables mentioned in the private clause of a parallel directive follow the normal rules for variables: the variable must already be declared at the point it is used.

The effect of private is to create a copy of the variable for each thread. Then the threads can update the value without worrying about changes that could be made by other threads. At the end of the parallel block, the values are generally lost unless there are other clauses included in the parallel directive. The reduction directive is the most common, as it can combine the results from each thread into a final result for the loop.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM