简体   繁体   中英

Improve performance of 3d array

I try to loop through big 3D array of structures and it works so slowly. Then I used 1D array instead of 3D, but without success.

I use structure below to describe parameters of one cell of 3D mesh:

struct cellStruct
{
    double v1;
    // more variables here
    double v15;
    double v16;
    double v17;
    double v18;
};

Please take a look to two used approaches.

  1. 3D arrays

     #define Nx 500 #define Ny 500 #define Nz 500 cellStruct ***cell; cell = new cellStruct **[Nx]; for(int i=0;i<Nx;i++) { cell[i]=new cellStruct *[Ny]; for(int j=0;j<Ny;j++) cell[i][j]=new cellStruct [Nz]; } for (i = 0; i< Nx; ++i) for (j = 0; j< Ny; ++j) for (k = 0; k< Nz; ++k) { // big algorithm that uses array like in string below cell[i][j][k+1].v1 = cell[i][j+1][k-1].v2 * cell[i+1][Ny-1][k+1].v5; } 
  2. 1D array

     #define cell(i,j,k) (cells[(i)*Ny*Nz + (j)*Ny + (k)]) cellStruct *cells = new cellStruct [Nx*Ny*Nz]; for (i = 1; i< Nx-1; ++i) for (j = 1; j< Ny-1; ++j) for (k = 1; k< Nz-1; ++k) { cell(i,j,k+1).v1 = cell(i,j+1,k-1).v2 * cell(i+1,Ny-1,k+1).v5; } 

Program works more slowly in case 2. How else I can improve approach of working with big 3D array? Using float variables speed up calculations twice, but I want to have more accuracy. Maybe is better to use structure with pointers to variables inside like below?

struct cells
{
    double ***v1;
    // ...
    double ***v15;
    double ***v16;
    double ***v17;
    double ***v18;
};

well 500^3 is quite a size -> 125M cells

  • so static allocation is out of question
  • each double is 8Byte so for each double inside 8B x 125M = 1G Bytes (G is 1000^3 not 1024^3 in this case !!!)
  • so ~1GB for each double variable inside cell
  • so define what means slow runtime for n[GB] data processing ?

You can do only this:

1.rewrite computation to be more effective

  • so data computed fit inside at least L1 cache
  • this means compute all in chunks
  • of course if you do not have a repetitive use of cells then you have nothing to improve with this

2.use multithreading

  • try to parallelize your computations
  • but your data is so huge so this approach can even slow things down so be careful !!!
  • when your data do not fit inside Caches then your computation power is like more powerful 386 machine !!!

3.pack the input data

  • the best option for you is some cell packing algorithm
  • they are Voxel's or not ?
  • so adjacent cells should be similar (inside regions)
  • I strongly recommend RLE (run length encoding)
  • it is fast and very efficient for this kind of data (at least for kind of data I assume you use)
  • if your data is not suited for RLE then divide cells to regions and use DCT/iDCT (like on jpg compresion)
  • packing/unpacking data should greatly impove your computation times
  • because after packing your data set could be very very smaller then now

Since you want to improve your cache efficiency, converting an array of structures to a structure of arrays will help you.

I am almost sure you will have to convert your triple-indirect pointers to 1-D arrays too, in order to make the struct-of-arrays idea effective.

struct cellStruct
{
    double* v1; // you can use std::vector<double> instead of double*
    // more variables here
    double* v15;
    double* v16;
    double* v17;
    double* v18;
};

Since your calculation only uses v1 , v2 and v5 , caching all other variables is best disabled. Using the struct-of-arrays layout allocates different memory regions for v1 , v2 , v3 , etc - so you don't force the cache to load these useless v3 , v4 , v6 , ...

Some syntax tweaks:

#define CELL_ACCESS(cells,vn,i,j,k) (cells.vn[(i)*Ny*Nz + (j)*Ny + (k)])
cellStruct cells;
cells.v1 = new double[Nx*Ny*Nz]; // if you use std::vector, adjust code accordingly
cells.v2 = new double[Nx*Ny*Nz];
...
for (i = 1; i< Nx-1; ++i)
    for (j = 1; j< Ny-1; ++j)
        for (k = 1; k< Nz-1; ++k)
        {
            CELL_ACCESS(     cells, v1, i,   j,    k+1) =
                CELL_ACCESS( cells, v2, i,   j+1,  k-1) *
                CELL_ACCESS( cells, v5, i+1, Ny-1, k+1);
        }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM