Improve performance of 3d array

Question

I try to loop through big 3D array of structures and it works so slowly. Then I used 1D array instead of 3D, but without success.

I use structure below to describe parameters of one cell of 3D mesh:

struct cellStruct
{
    double v1;
    // more variables here
    double v15;
    double v16;
    double v17;
    double v18;
};

Please take a look to two used approaches.

3D arrays

 #define Nx 500 #define Ny 500 #define Nz 500 cellStruct ***cell; cell = new cellStruct **[Nx]; for(int i=0;i<Nx;i++) { cell[i]=new cellStruct *[Ny]; for(int j=0;j<Ny;j++) cell[i][j]=new cellStruct [Nz]; } for (i = 0; i< Nx; ++i) for (j = 0; j< Ny; ++j) for (k = 0; k< Nz; ++k) { // big algorithm that uses array like in string below cell[i][j][k+1].v1 = cell[i][j+1][k-1].v2 * cell[i+1][Ny-1][k+1].v5; }

1D array

 #define cell(i,j,k) (cells[(i)*Ny*Nz + (j)*Ny + (k)]) cellStruct *cells = new cellStruct [Nx*Ny*Nz]; for (i = 1; i< Nx-1; ++i) for (j = 1; j< Ny-1; ++j) for (k = 1; k< Nz-1; ++k) { cell(i,j,k+1).v1 = cell(i,j+1,k-1).v2 * cell(i+1,Ny-1,k+1).v5; }

Program works more slowly in case 2. How else I can improve approach of working with big 3D array? Using float variables speed up calculations twice, but I want to have more accuracy. Maybe is better to use structure with pointers to variables inside like below?

struct cells
{
    double ***v1;
    // ...
    double ***v15;
    double ***v16;
    double ***v17;
    double ***v18;
};

Answer 1

well 500^3 is quite a size -> 125M cells

so static allocation is out of question
each double is 8Byte so for each double inside 8B x 125M = 1G Bytes (G is 1000^3 not 1024^3 in this case !!!)
so ~1GB for each double variable inside cell
so define what means slow runtime for n[GB] data processing ?

You can do only this:

1.rewrite computation to be more effective

so data computed fit inside at least L1 cache
this means compute all in chunks
of course if you do not have a repetitive use of cells then you have nothing to improve with this

2.use multithreading

try to parallelize your computations
but your data is so huge so this approach can even slow things down so be careful !!!
when your data do not fit inside Caches then your computation power is like more powerful 386 machine !!!

3.pack the input data

the best option for you is some cell packing algorithm
they are Voxel's or not ?
so adjacent cells should be similar (inside regions)
I strongly recommend RLE (run length encoding)
it is fast and very efficient for this kind of data (at least for kind of data I assume you use)
if your data is not suited for RLE then divide cells to regions and use DCT/iDCT (like on jpg compresion)
packing/unpacking data should greatly impove your computation times
because after packing your data set could be very very smaller then now

Answer 2

Since you want to improve your cache efficiency, converting an array of structures to a structure of arrays will help you.

I am almost sure you will have to convert your triple-indirect pointers to 1-D arrays too, in order to make the struct-of-arrays idea effective.

struct cellStruct
{
    double* v1; // you can use std::vector<double> instead of double*
    // more variables here
    double* v15;
    double* v16;
    double* v17;
    double* v18;
};

Since your calculation only uses v1 , v2 and v5 , caching all other variables is best disabled. Using the struct-of-arrays layout allocates different memory regions for v1 , v2 , v3 , etc - so you don't force the cache to load these useless v3 , v4 , v6 , ...

Some syntax tweaks:

#define CELL_ACCESS(cells,vn,i,j,k) (cells.vn[(i)*Ny*Nz + (j)*Ny + (k)])
cellStruct cells;
cells.v1 = new double[Nx*Ny*Nz]; // if you use std::vector, adjust code accordingly
cells.v2 = new double[Nx*Ny*Nz];
...
for (i = 1; i< Nx-1; ++i)
    for (j = 1; j< Ny-1; ++j)
        for (k = 1; k< Nz-1; ++k)
        {
            CELL_ACCESS(     cells, v1, i,   j,    k+1) =
                CELL_ACCESS( cells, v2, i,   j+1,  k-1) *
                CELL_ACCESS( cells, v5, i+1, Ny-1, k+1);
        }

Improve performance of 3d array

Question

2 answers

solution1
0 2014-02-11 10:59:02

solution2
0 2014-02-11 11:49:59

Improve performance of 3d array

Question

2 answers

solution1 0 2014-02-11 10:59:02

solution2 0 2014-02-11 11:49:59

solution1
0 2014-02-11 10:59:02

solution2
0 2014-02-11 11:49:59