简体   繁体   中英

OpenMP with 1 thread slower than sequential version

I have implemented knapsack using OpenMP (gcc version 4.6.3)

#define MAX(x,y)   ((x)>(y) ? (x) : (y))
#define table(i,j)    table[(i)*(C+1)+(j)]

   for(i=1; i<=N; ++i) {
#pragma omp parallel for
      for(j=1; j<=C; ++j) {
         if(weights[i]>j) {
            table(i,j) = table(i-1,j);
         }else {
            table(i,j) = MAX(profits[i]+table(i-1,j-weights[i]), table(i-1,j));
         }
      }
   }

execution time for the sequential program = 1s

execution time for the openmp with 1 thread = 1.7s (overhead = 40%)

Used the same compiler optimization flags (-O3) in the both cases.

Can someone explain the reason behind this behavior.

Thanks.

Enabling OpenMP inhibits certain compiler optimisations, eg it could prevent loops from being vectorised or shared variables from being kept in registers. Therefore OpenMP-enabled code is usually slower than the serial and one has to utilise the available parallelism to offset this.

That being said, your code contains a parallel region nested inside the outer loop. This means that the overhead of entering and exiting the parallel region is multiplied N times. This only makes sense if N is relatively small and C is significantly larger (like orders of magnitude larger) than N , therefore the work being done inside the region greatly outweighs the OpenMP overhead.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM