Why does loop order matter when there's strided prefetching?

Question

In C you're told to iterate through a matrix in a row-major order since that's how the arrays are stored underneath the hood and row-major iteration is utilizes the whole cache-line, which leads to fewer cache misses. And indeed, I do see a massive performance difference between row-major and column-major iteration on my machine. Test code:

#include <stdio.h>
#include <stdlib.h>

#include <time.h>
#include <sys/resource.h>

int getTime()
{
  struct timespec tsi;

  clock_gettime(CLOCK_MONOTONIC, &tsi);
  double elaps_s = tsi.tv_sec;
  long elaps_ns = tsi.tv_nsec;
  return (int) ((elaps_s + ((double)elaps_ns) / 1.0e9) * 1.0e3);
}

#define N 1000000
#define M 100

void main()
{
  int *src = malloc(sizeof(int) * N * M);
  int **arr = malloc(sizeof(int*) * N);
  for(int i = 0; i < N; ++i)
    arr[i] = &src[i * M];

  for(int i = 0; i < N; ++i)
    for(int j = 0; j < M; ++j)
      arr[i][j] = 1;

  int total = 0;

  int pre = getTime();


  for(int j = 0; j < M; ++j)
    for(int i = 0; i < N; ++i)
      total += arr[i][j];

  /*
  for(int i = 0; i < N; ++i)
    for(int j = 0; j < M; ++j)
      total += arr[i][j];
  */

  int post = getTime();

  printf("Result: %d, took: %d ms\n", total, post - pre);
}

However, modern memory systems have prefetchers which can predict strided accesses and when you iterate through a column you are following a very regular pattern. Shouldn't this allow column-major iteration to perform similarly to row-major iteration?

Answer 1

A cache line has a certain size (for example 64 bytes) and the processor reads and writes complete cache lines. Compare the number of bytes that are processed and the number of bytes that are read and written.

Why does loop order matter when there's strided prefetching?

Question

1 answers

solution1
3 ACCPTED 2016-06-27 15:47:48

Why does loop order matter when there's strided prefetching?

Question

1 answers

solution1 3 ACCPTED 2016-06-27 15:47:48

solution1
3 ACCPTED 2016-06-27 15:47:48