Confused about firstprivate and threadprivate in OpenMP context

Question

Say I have packed some resources in an object, and then perform some computation based on the resources. What I normally do is to initialise the objects outside the parallel region, and then use firstprivte keywords

int main()
{
        // initialize Widget objs
         Widget Widobj{params1,params2,params3...};

        #pragma omp parallel for firstprivate(Widobj)
        for (int i=0; i< N; ++i)
          {
             // computation based on resources in Widobj
          }

}

And I think in this case, each thread will deal with the resource in Widobj independently, and I suppose each thread will have a copy of Widobj(probably a deep copy, am I right?). Now I get confused by the other keyword threadprivate , how does threadprivate work in this context? Seems to me they are very similar

Answer 1

When an object is declared firstprivate , the copy constructor is called, whereas when private is used the default constructor is called. We'll address threadprivate below. Proof (Intel C++ 15.0):

#include <iostream>
#include <omp.h>

class myclass {
    int _n;
public:
    myclass(int n) : _n(n) { std::cout << "int c'tor\n"; }

    myclass() : _n(0) { std::cout << "def c'tor\n"; }

    myclass(const myclass & other) : _n(other._n)
    { std::cout << "copy c'tor\n"; }

    ~myclass() { std::cout << "bye bye\n"; }

    void print() { std::cout << _n << "\n"; }

    void add(int t) { _n += t; }
};

myclass globalClass;

#pragma omp threadprivate (globalClass)

int main(int argc, char* argv[])
{
    std::cout << "\nBegninning main()\n";

    myclass inst(17);

    std::cout << "\nEntering parallel region #0 (using firstprivate)\n";
#pragma omp parallel firstprivate(inst)
    {
        std::cout << "Hi\n";
    }

    std::cout << "\nEntering parallel region #1 (using private)\n";
#pragma omp parallel private(inst)
    {
        std::cout << "Hi\n";
    }

    std::cout << "\nEntering parallel region #2 (printing the value of "
                    "the global instance(s) and adding the thread number)\n";
#pragma omp parallel
    {
        globalClass.print();
        globalClass.add(omp_get_thread_num());
    }

    std::cout << "\nEntering parallel region #3 (printing the global instance(s))\n";
#pragma omp parallel
    {
        globalClass.print();
    }

    std::cout << "\nAbout to leave main()\n";
    return 0;
}

gives

def c'tor

Begninning main()
int c'tor

Entering parallel region #0 (using firstprivate)
copy c'tor
Hi
bye bye
copy c'tor
Hi
bye bye
copy c'tor
Hi
bye bye
copy c'tor
Hi
bye bye

Entering parallel region #1 (using private)
def c'tor
Hi
bye bye
def c'tor
Hi
bye bye
def c'tor
Hi
bye bye
def c'tor
Hi
bye bye

Entering parallel region #2 (printing the value of the global instance(s) and adding the thread number)
def c'tor
0
def c'tor
0
def c'tor
0
0

Entering parallel region #3 (printing the global instance(s))
0
1
2
3

About to leave main()
bye bye
bye bye

If the copy constructor does a deep copy (which it should if you have to write your own, and does by default if you don't and have dynamically allocated data), then you get a deep copy of your object. This is as opposed to private which doesn't initialize the private copy with an existing object.

threadprivate works totally differently. To start with, it's only for global or static variables. Even more critical, it's a directive in and of itself and supports no other clauses. You write the threadprivate pragma line somewhere and later the #pragma omp parallel before the parallel block. There are other differences (where in memory the object is stored, etc.) but that's a good start.

Let's analyze the above output. First, note that on entering region #2 the default constructor is called creating a new global variable private to the thread. This is because on entering the first parallel region the parallel copy of the global variable doesn't yet exist.

Next, as NoseKnowsAll considers the most crucial difference, the thread private global variables are persistent through different parallel regions. In region #3 there is no construction and we see that the added OMP thread number from region #2 is retained. Also note that no destructor is called in regions 2 and 3, but rather after leaving main() (and only one (master) copy for some reason - the other is inst . This may be a bug...).

This brings us to why I used the Intel compiler. Visual Studio 2013 as well as g++ (4.6.2 on my computer, Coliru (g++ v5.2) , codingground (g++ v4.9.2) ) allow only POD types ( source ). This is listed as a bug for almost a decade and still hasn't been fully addressed. The Visual Studio error given is

error C3057: 'globalClass' : dynamic initialization of 'threadprivate' symbols is not currently supported

and the error given by g++ is

error: 'globalClass' declared 'threadprivate' after first use

The Intel compiler works with classes.

One more note. If you want to copy the value of the master thread variable you can use #pragma omp parallel copyin(globalVarName) . Note that this does not work with classes as in our example above (hence I left it out).

Sources: OMP tutorial : private , firstprivate , threadprivate

Confused about firstprivate and threadprivate in OpenMP context

Question

1 answers

solution1
3 ACCPTED 2015-09-02 08:06:27

Confused about firstprivate and threadprivate in OpenMP context

Question

1 answers

solution1 3 ACCPTED 2015-09-02 08:06:27

solution1
3 ACCPTED 2015-09-02 08:06:27