简体   繁体   中英

Loop Tiling Optimisations

I've been attempting to optimise one of my loops in my C code in order to make it use the cache more efficiently. I have a few issues. I'm not 100% sure if I'm even writing the code correctly to loop block due to the fact that I am seeing no increase in speed in the run time of my programme. Here is the code:

for(int k = 0; k < N; k+=b){
  for (int i = k; i<MIN(N,i+b); ++i) {
    a1[i] = 0.0f;
    a2[i] = 0.0f;
      for (int j = 0; j < N; j++) {
           x = x[j] - x[i];
           y = y[j] - y[i];
           2 = x*x + y*y + eps;
           r2inv = 1.0f / sqrt(r2);
           r6inv = r2inv * r2inv * r2inv;
           s = m[j] * r6inv;
          ax[i] += s * x;
          ay[i] += s * y;
      }
  }
}

I also have another issue. How do I go about choosing a correct block size? I understand that you want to load in enough to fill the l1 cache.

Thanks for the help in advance.

What you are doing is rather pointless, because i goes from 0 to N-1 in your code, just in a slightly more complicated way. So you benefit exactly zero from your attempts at tiling.

What is more critical is the array y, so that is what you should be tiling (if N is large, and if the speed isn't limited by the division and square root). For every value i, you make one complete pass through the array y. You can also easily save a few floating point operations for each j, and since r6inv is symmetrical between i and j, only half the values need to be calculated.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM