简体   繁体   English

如何在OpenMP并行代码中处理返回?

[英]How to process return in OpenMP parallel code?

My requirement is like this: every thread allocates memory itself, then processes it: 我的要求是这样的:每个线程自己分配内存,然后处理它:

typedef struct
{
    ......
}A;

A *p[N];

#pragma omp parallel
{
    #pragma omp for
    for (int i = 0; i < N; i++) {
        p[i] = (A*)calloc(sizeof(*p[i]), N);
        if (NULL == p[i]) {
            return;
        }
        ......          
    }
}

But the compiler will complain: 但编译器会抱怨:

error: invalid exit from OpenMP structured block
     return;

So except put the allocating memory code out of the #pragma omp parallel : 所以除了将分配内存代码放在#pragma omp parallel

for (int i = 0; i < N; i++) {
    p[i] = (A*)calloc(sizeof(*p[i]), N);
    if (NULL == p[i]) {
        return;
    }       
}
#pragma omp parallel
{
    #pragma omp for
    ......
}

Is there any better method? 有没有更好的方法?

You're looking for this, I think: 我想你正在寻找这个:

#pragma omp parallel
{
    #pragma omp for
    for (int i = 0; i < N; i++) {
        p[i] = (A*)calloc(sizeof(*p[i]), N);
        if (NULL == p[i]) {
            #pragma omp cancel for
        }
        ......          
    }
}

But you'll need to set the environment variable OMP_CANCELLATION to true for this to work. 但是您需要将环境变量OMP_CANCELLATION设置为true才能使其OMP_CANCELLATION

You should try to avoid doing this, though, because cancellation is expensive. 但是,您应该尽量避免这样做,因为取消费用很高。

You could try this 你可以试试这个

omp_set_dynamic(0); //Explicitly turn off dynamic threads
bool cancel = false;    

#pragma omp parallel for schedule(static)
for (int i = 0; i < N; i++) {
    p[i] = (A*)calloc(sizeof(*p[i]),N);
    if (NULL == p[i]) cancel = true;
}
if(cancel) return;
#pragma omp parallel for schedule(static)
for (int i = 0; i < N; i++) {
    ......   
}

This could allocate the memory local to each core/node. 这可以为每个核心/节点分配本地存储器。 I turned off dynamic adjusting the number of threads and used schedule(static) to make sure the threads in the second for loop access the same memory allocated in the first for loop. 我关闭动态调整线程数并使用schedule(static)来确保第二个for循环中的线程访问第一个for循环中分配的相同内存。

I don't know if this solution would be any better. 我不知道这个解决方案是否会更好。 According to this comment it could be worse. 根据这一评论 ,可能会更糟。 It could make a big difference if you have a multi-socket (NUMA) system or not. 如果你有一个多插座(NUMA)系统,它可能会有很大的不同。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM