Below is my function that I'm trying to optimize using OpenMP and Loop Tiling(aka Loop Blocking). However, my output of out currently gives the wrong value after I apply the loop tiling like below. Can someone look over my code, and point out what makes it wrong. Thank you so much
#include <stdlib.h>
#include <stdio.h>
#include <omp.h>
#include "utils.h"
const long BLOCK_SIZE = 8*DIM;
int i, j, k,ii,jj,kk, dim = DIM-1;
long compute, out = 1.0, we_need, gimmie;
void work_it_par(long *old, long *new)
{
we_need = need_func();
gimmie = gimmie_func();
#pragma omp parallel for private(i,j,k,ii,jj,kk, compute) firstprivate(we_need, gimmie, dim,old,BLOCK_SIZE) reduction(+:out) num_threads(omp_get_num_procs())
for (ii=1; ii<dim-BLOCK_SIZE; ii+=BLOCK_SIZE) {
for (jj=1; jj<dim-BLOCK_SIZE; jj+=BLOCK_SIZE) {
for (kk=1; kk<dim-BLOCK_SIZE; kk+=BLOCK_SIZE) {
for (i=ii; i<ii+BLOCK_SIZE; i++) {
for (j=jj; j<jj+BLOCK_SIZE; j++) {
for (k=kk; k<kk+BLOCK_SIZE; k++) {
//int temp = i*DIM*DIM+j*DIM+k;
compute = old[i*DIM*DIM+j*DIM+k] * we_need;
out += compute / gimmie;
}
}
}
}
}
}
printf("AGGR:%ld\n",out);
}
First of all, const long BLOCK_SIZE = 8*DIM;
seems super fishy to me... Maybe replacing the *
by a /
would be more of what you wanted?
But even though, you still have to deal with the limits by checking that the i
, j
and k
indexes do not go over their limits. I let you figure out how to achieve that.
Last point on the algorithm: are you sure your loops have to start from index 1?
Finally, a few notes on the OpenMP correctness:
firstprivate(we_need, gimmie, dim,old,BLOCK_SIZE)
doesn't make much sense. These could happily stay shared
. num_threads(omp_get_num_procs())
is correct or not. My feeling is that it is indeed valid, but just for "safety", I would tend to separate the call to the function from the directive (by either calling the function first and storing its result in a constant, and using it in the directive, or calling omp_set_num_threads()
before the parallel
directive) collapse
directive to increase the level of parallelism you achieve here... Good luck with your code.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.