简体   繁体   中英

How to use omp parallel for and omp simd together?

I want to test #pragma omp parallel for and #pragma omp simd for a simple matrix addition program. When I use each of them separately, I get no error and it seems fine. But, I want to test how much performance can be gained using both of them. If I use #pragma omp parallel for before the outer loop and #pragma omp simd before the inner loop I get no error as well. The error occures when I use both of them before the outer loop. I get an error at runtime not compile time. ICC and GCC return error but Clang doesn't. It might be because Clang regect the parallelization. In my experiments, Clang does not parallelize and run the program with only one thread.

The program is here:

#include <stdio.h>
//#include <x86intrin.h>
#define N 512
#define M N

int __attribute__(( aligned(32))) a[N][M],
    __attribute__(( aligned(32))) b[N][M],
    __attribute__(( aligned(32))) c_result[N][M];

int main()
{
    int i, j;
    #pragma omp parallel for
    #pragma omp simd
    for( i=0;i<N;i++){
        for(j=0;j<M;j++){
            c_result[i][j]= a[i][j] + b[i][j];
        }
    }

    return 0;
}

The error for: ICC:

IMP1.c(20): error: omp directive is not followed by a parallelizable for loop #pragma omp parallel for ^

compilation aborted for IMP1.c (code 2)

GCC:

IMP1.c: In function 'main':

IMP1.c:21:10: error: for statement expected before '#pragma' #pragma omp simd

Because in my other testes pragma omp simd for outer loop gets better performance I need to put that there (don't I?).

Platform: Intel Core i7 6700 HQ, Fedora 27

Tested compilers: ICC 18, GCC 7.2, Clang 5

Compiler command line:

icc -O3 -qopenmp -xHOST -no-vec

gcc -O3 -fopenmp -march=native -fno-tree-vectorize -fno-tree-slp-vectorize

clang -O3 -fopenmp=libgomp -march=native -fno-vectorize -fno-slp-vectorize

From OpenMP 4.5 Specification:

2.11.4 Parallel Loop SIMD Construct

The parallel loop SIMD construct is a shortcut for specifying a parallel construct containing one loop SIMD construct and no other statement.

The syntax of the parallel loop SIMD construct is as follows:

#pragma omp parallel for simd ...

You can also write:

#pragma omp parallel
{
   #pragma omp for simd
   for ...
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM