简体   繁体   English

在OpenMP上下文中对firstprivate和threadprivate感到困惑

[英]Confused about firstprivate and threadprivate in OpenMP context

Say I have packed some resources in an object, and then perform some computation based on the resources. 假设我在对象中打包了一些资源,然后根据资源执行一些计算。 What I normally do is to initialise the objects outside the parallel region, and then use firstprivte keywords 我通常做的是初始化并行区域之外的对象,然后使用firstprivte关键字

int main()
{
        // initialize Widget objs
         Widget Widobj{params1,params2,params3...};

        #pragma omp parallel for firstprivate(Widobj)
        for (int i=0; i< N; ++i)
          {
             // computation based on resources in Widobj
          }

}

And I think in this case, each thread will deal with the resource in Widobj independently, and I suppose each thread will have a copy of Widobj(probably a deep copy, am I right?). 我认为在这种情况下,每个线程将独立处理Widobj中的资源,我想每个线程都有一个Widobj的副本(可能是一个深层副本,我是对的吗?)。 Now I get confused by the other keyword threadprivate , how does threadprivate work in this context? 现在我对其他关键字threadprivate感到困惑,threadprivate在这个上下文中如何工作? Seems to me they are very similar 在我看来他们非常相似

When an object is declared firstprivate , the copy constructor is called, whereas when private is used the default constructor is called. 当一个对象被声明为firstprivate ,将调用复制构造函数,而当使用private时,将调用默认构造函数。 We'll address threadprivate below. 我们将在下面解决threadprivate Proof (Intel C++ 15.0): 证明(英特尔C ++ 15.0):

#include <iostream>
#include <omp.h>

class myclass {
    int _n;
public:
    myclass(int n) : _n(n) { std::cout << "int c'tor\n"; }

    myclass() : _n(0) { std::cout << "def c'tor\n"; }

    myclass(const myclass & other) : _n(other._n)
    { std::cout << "copy c'tor\n"; }

    ~myclass() { std::cout << "bye bye\n"; }

    void print() { std::cout << _n << "\n"; }

    void add(int t) { _n += t; }
};

myclass globalClass;

#pragma omp threadprivate (globalClass)

int main(int argc, char* argv[])
{
    std::cout << "\nBegninning main()\n";

    myclass inst(17);

    std::cout << "\nEntering parallel region #0 (using firstprivate)\n";
#pragma omp parallel firstprivate(inst)
    {
        std::cout << "Hi\n";
    }

    std::cout << "\nEntering parallel region #1 (using private)\n";
#pragma omp parallel private(inst)
    {
        std::cout << "Hi\n";
    }

    std::cout << "\nEntering parallel region #2 (printing the value of "
                    "the global instance(s) and adding the thread number)\n";
#pragma omp parallel
    {
        globalClass.print();
        globalClass.add(omp_get_thread_num());
    }

    std::cout << "\nEntering parallel region #3 (printing the global instance(s))\n";
#pragma omp parallel
    {
        globalClass.print();
    }

    std::cout << "\nAbout to leave main()\n";
    return 0;
}

gives

def c'tor def c'tor

Begninning main() Begninning main()
int c'tor int c'tor

Entering parallel region #0 (using firstprivate) 输入并行区域#0(使用firstprivate)
copy c'tor 复制c'tor
Hi 你好
bye bye 再见
copy c'tor 复制c'tor
Hi 你好
bye bye 再见
copy c'tor 复制c'tor
Hi 你好
bye bye 再见
copy c'tor 复制c'tor
Hi 你好
bye bye 再见

Entering parallel region #1 (using private) 输入并行区域#1(使用私有)
def c'tor def c'tor
Hi 你好
bye bye 再见
def c'tor def c'tor
Hi 你好
bye bye 再见
def c'tor def c'tor
Hi 你好
bye bye 再见
def c'tor def c'tor
Hi 你好
bye bye 再见

Entering parallel region #2 (printing the value of the global instance(s) and adding the thread number) 输入并行区域#2(打印全局实例的值并添加线程号)
def c'tor def c'tor
0 0
def c'tor def c'tor
0 0
def c'tor def c'tor
0 0
0 0

Entering parallel region #3 (printing the global instance(s)) 输入并行区域#3(打印全局实例)
0 0
1 1
2 2
3 3

About to leave main() 即将离开main()
bye bye 再见
bye bye 再见

If the copy constructor does a deep copy (which it should if you have to write your own, and does by default if you don't and have dynamically allocated data), then you get a deep copy of your object. 如果复制构造函数执行深层复制(如果您必须编写自己的复制,并且默认情况下,如果您没有,并且具有动态分配的数据),那么您将获得对象的深层副本。 This is as opposed to private which doesn't initialize the private copy with an existing object. 这是相对于private不与现有对象初始化私有副本。

threadprivate works totally differently. threadprivate工作方式完全不同。 To start with, it's only for global or static variables. 首先,它仅适用于全局或静态变量。 Even more critical, it's a directive in and of itself and supports no other clauses. 更重要的是,它本身就是一个指令,并且不支持任何其他条款。 You write the threadprivate pragma line somewhere and later the #pragma omp parallel before the parallel block. 你可以在某处编写threadprivate pragma行,然后在并行块之前编写#pragma omp parallel There are other differences (where in memory the object is stored, etc.) but that's a good start. 还有其他差异(在内存中存储对象等),但这是一个良好的开端。

Let's analyze the above output. 让我们分析一下上面的输出。 First, note that on entering region #2 the default constructor is called creating a new global variable private to the thread. 首先,请注意,在进入区域#2时,将调用默认构造函数,为线程创建一个私有的新全局变量。 This is because on entering the first parallel region the parallel copy of the global variable doesn't yet exist. 这是因为在进入第一个并行区域时,全局变量的并行副本尚不存在。

Next, as NoseKnowsAll considers the most crucial difference, the thread private global variables are persistent through different parallel regions. 接下来,当NoseKnowsAll考虑最关键的区别时,线程私有全局变量通过不同的并行区域持久化。 In region #3 there is no construction and we see that the added OMP thread number from region #2 is retained. 在区域#3中没有构造,我们看到保留了来自区域#2的添加的OMP线程号。 Also note that no destructor is called in regions 2 and 3, but rather after leaving main() (and only one (master) copy for some reason - the other is inst . This may be a bug...). 还要注意,在区域2和3中没有调用析构函数,而是在离开main() (由于某种原因只有一个(主)副本 - 另一个是inst 。这可能是一个错误......)。

This brings us to why I used the Intel compiler. 这让我们知道为什么我使用英特尔编译器。 Visual Studio 2013 as well as g++ (4.6.2 on my computer, Coliru (g++ v5.2) , codingground (g++ v4.9.2) ) allow only POD types ( source ). Visual Studio 2013以及g ++(我的计算机上的4.6.2, Coliru(g ++ v5.2)编码地(g ++ v4.9.2) )仅允许POD类型( )。 This is listed as a bug for almost a decade and still hasn't been fully addressed. 这被列为近十年的错误,但仍未完全解决。 The Visual Studio error given is 给出的Visual Studio错误是

error C3057: 'globalClass' : dynamic initialization of 'threadprivate' symbols is not currently supported 错误C3057:'globalClass':当前不支持'threadprivate'符号的动态初始化

and the error given by g++ is 并且g ++给出的错误是

error: 'globalClass' declared 'threadprivate' after first use 错误:'globalClass'在首次使用后声明为'threadprivate'

The Intel compiler works with classes. 英特尔编译器适用于类。

One more note. 再说一遍。 If you want to copy the value of the master thread variable you can use #pragma omp parallel copyin(globalVarName) . 如果要复制主线程变量的值,可以使用#pragma omp parallel copyin(globalVarName) Note that this does not work with classes as in our example above (hence I left it out). 请注意,这符合类作为在上面的例子中工作(因此我离开它)。

Sources: OMP tutorial : private , firstprivate , threadprivate 来源: OMP教程私有firstprivatethreadprivate

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM