1 or 3 dimensional array?

Question

The author of this topic claims that accessing a 1D array converted from a 2D array with fixed lengths is much faster than accessing the original 2D array, at least in C#. I wonder whether this applies also to C/C++ or not.

When using 3D arrays, the value at (x, y, z) is fetched by dereferencing the pointer to the array three times:

int val = arr[x][y][z];

But you can convert the array to an 1D array and calculate an index for each coordinate, so the code changes into:

int val = arr[SIZE_X * SIZE_Y * z + SIZE_X * y + x];

This would replace the three dereferencing operations by one dereferencing and 3 multiplications and 2 additions.

The question is: is dereferencing three times slower or faster than calculating the index of the coordinates?

Benchmark test output:

3 dimensions: 5s
1 dimension: 14s
1 dimension fast: 4s

code:

#include <iostream>
#include <time.h>

int main(int argc, char** argv)
{
    const int SIZE_X = 750, SIZE_Y = SIZE_X, SIZE_Z = SIZE_X;
    const int SIZE_XY = SIZE_X * SIZE_Y;

    time_t startTime;

    // 3 dimensions
    time(&startTime);
    int ***array3d = new int **[SIZE_X];
    for (int x = 0; x < SIZE_X; ++x)
    {
        array3d[x] = new int *[SIZE_Y];
        for (int y = 0; y < SIZE_Y; ++y)
            array3d[x][y] = new int[SIZE_Z];
    }

    for (int x = 0; x < SIZE_X; ++x)
        for (int y = 0; y < SIZE_Y; ++y)
            for (int z = 0; z < SIZE_Z; ++z)
                array3d[x][y][z] = 0;

    for (int x = 0; x < SIZE_X; ++x)
    {
        for (int y = 0; y < SIZE_Y; ++y)
            delete[] array3d[x][y];
        delete[] array3d[x];
    }

    std::cout << "3 dimensions: " << time(0) - startTime << "s\n";

    time(&startTime);
    int *array1d = new int[SIZE_X * SIZE_Y * SIZE_Z];
    for (int x = 0; x < SIZE_X; ++x)
        for (int y = 0; y < SIZE_Y; ++y)
            for (int z = 0; z < SIZE_Z; ++z)
                array1d[x + SIZE_X * y + SIZE_XY * z] = 0;
    delete[] array1d;
    std::cout << "1 dimension: " << time(0) - startTime << "s\n";

    time(&startTime);
    array1d = new int[SIZE_X * SIZE_Y * SIZE_Z];
    int i = 0;
    for (int x = 0; x < SIZE_X; ++x)
        for (int y = 0; y < SIZE_Y; ++y)
            for (int z = 0; z < SIZE_Z; ++z)
                array1d[++i] = 0;
    delete[] array1d;
    std::cout << "1 dimension fast: " << time(0) - startTime << "s\n";

    return 0;
}

Result: 3d is faster and just a bit slower than the fast version of the 1 dimensional array.

EDIT: I changed the 1 dimensional array loop to this:

for (int z = 0; z < SIZE_Z; ++z)
    for (int y = 0; y < SIZE_Y; ++y)
        for (int x = 0; x < SIZE_X; ++x)
            array1d[x + SIZE_X * y + SIZE_XY * z] = 0;

And it took merely 5 seconds, as fast as the 3d variant.

So the order of access matters, not the dimensions. I think.

Answer 1

Sorry about long answer.

It is more about memory access pattern. But first, about benchmarking a little:

When benchmarking, never count seconds, as second is too long. At least use milliseconds.
Don't include parts that you don't want to test into benchmarked section - in given example, it is new and delete , they should be outside.
changing order of benchmarks could produce different results because of cache utilisation
be sure that all benchmarked versions follows the same algorithm (if you testing implementations, not algorithms themselves). This part in given example isn't true, I'll explain a bit later.

Now back to arrays. First of all, in given example, memset should be used, not reinventing wheel. I understand that it is for testing purpose, but in that case it is better to use eg rand() (although values should be lowered, as rand is much much slower than =0, it takes too long to test). But no matter, here it goes:

In 3-dimensional version your innermost loop accesses linear array. This is very cache-friendly and fast way. Dereferencing isn't performed on every loop iteration, because compiler see that it can't change. So, most heavily used line of code - innermost loop - accesses linear memory array.

'fast' version of 1d array doing the same. Good one too. memset is still better, though :-).

But when it comes to 'slow' 1d version, things are messed up. Look at your index line: array1d[x + SIZE_X * y + SIZE_XY * z] = 0; . Innermost loop iterates z , so on each iteration you setting veeeeery far int. This access pattern simply makes data cache useless, and most of the time your program just waits for data to be written into memory. However, if you change it to array1d[SIZE_XY * x + SIZE_X * y + z] = 0; , it once again becomes linear array access, and therefore becomes very fast. Plus if you want to, left part of addition may be calculated in outer loop, potentially making it a bit faster.

But real greatness of 1d array is that it could be accessed linearly from start to end. If algorithm that uses it may be rearranged to traverse array that way - it's win-win scenario.

If you want to test it, just change [x][y][z] order in your 3d version to [z][y][x] and see dramatically reduced performance.

So, about initial question - answer is 'it depends'. Most of all it depends upon data access pattern, but also upon many other things, like actual depth of array dimensions, size of each dimension, frequency of supporting effects like new/delete, and many many more. But if you can linearise data access - it would be fast already, but in that case you don't need 3D, right?

(yes, I am obviously in favour of 1D arrays with manually calculated index, so count me biased. Sorry).

Answer 2

Why don't you simply check out the disassembly of each option and find out?

Of course, the disassembly depends on the compiler in use, which in turn depends on the CPU architecture and its supported operations.

This is in fact the most important statement here, as each option may have its own advantages and disadvantages over the other, depending on your platform (compiler, linker, processor).

So without specifying the underlying platform, there might not be a decisive answer to the general question at hand.

The answer below is divided into two cases.

In each case, it examines both options (a 1D-array and a 3D-array), using the disassembly of each option compiled with Microsoft Visual C++ 2010 for Pentium E5200 as an example.

Case #1 - Statically Allocated Arrays

#define X 10
#define Y 10
#define Z 10

int val = array3d[x][y][z];
mov         eax,dword ptr [x]  
imul        eax,eax,190h  
add         eax,dword ptr [array3d]  
mov         ecx,dword ptr [y]  
imul        ecx,ecx,28h  
add         eax,ecx  
mov         edx,dword ptr [z]  
mov         eax,dword ptr [eax+edx*4]  
mov         dword ptr [val],eax  

int val = array1d[x+X*y+X*Y*z];
mov         eax,dword ptr [y]  
imul        eax,eax,0Ah  
add         eax,dword ptr [x]  
mov         ecx,dword ptr [z]  
imul        ecx,ecx,64h  
add         eax,ecx  
mov         edx,dword ptr [array1d]  
mov         eax,dword ptr [edx+eax*4]  
mov         dword ptr [val],eax

As you can see, the "math" is slightly different, but apart from that, these two options are practically identical. So the only thing that may affect performance here is runtime caching, though I don't how either one of these two options has a clear advantage over the other with regards to that aspect.

Case #2 - Dynamically Allocated Arrays

#define X 10
#define Y 10
#define Z 10

int val = array3d[x][y][z];
mov         eax,dword ptr [x]  
mov         ecx,dword ptr [array3d]  
mov         edx,dword ptr [ecx+eax*4]  
mov         eax,dword ptr [y]  
mov         ecx,dword ptr [edx+eax*4]  
mov         edx,dword ptr [z]  
mov         eax,dword ptr [ecx+edx*4]  
mov         dword ptr [val],eax  

int val = array1d[x+X*y+X*Y*z];
mov         eax,dword ptr [y]  
imul        eax,eax,0Ah  
add         eax,dword ptr [x]  
mov         ecx,dword ptr [z]  
imul        ecx,ecx,64h  
add         eax,ecx  
mov         edx,dword ptr [array1d]  
mov         eax,dword ptr [edx+eax*4]  
mov         dword ptr [val],eax

This time, the results are notably different, yet it's rather hard to determine which one (if any) is consistently better than the other. When using a 3D-array, there seem to be a lot more Load ( mov ) operations than when using a 1D-array. So the runtime performance here is highly dependent on the location of each array in memory (RAM, L2 Cache, etc).

1 or 3 dimensional array?

Question

2 answers

solution1
3 ACCPTED 2014-05-04 16:16:07

solution2
1 2014-05-04 18:04:26

Case #1 - Statically Allocated Arrays

Case #2 - Dynamically Allocated Arrays

1 or 3 dimensional array?

Question

2 answers

solution1 3 ACCPTED 2014-05-04 16:16:07

solution2 1 2014-05-04 18:04:26

Case #1 - Statically Allocated Arrays

Case #2 - Dynamically Allocated Arrays

solution1
3 ACCPTED 2014-05-04 16:16:07

solution2
1 2014-05-04 18:04:26