I try to loop through big 3D array of structures and it works so slowly. Then I used 1D array instead of 3D, but without success.
I use structure below to describe parameters of one cell of 3D mesh:
struct cellStruct
{
double v1;
// more variables here
double v15;
double v16;
double v17;
double v18;
};
Please take a look to two used approaches.
3D arrays
#define Nx 500 #define Ny 500 #define Nz 500 cellStruct ***cell; cell = new cellStruct **[Nx]; for(int i=0;i<Nx;i++) { cell[i]=new cellStruct *[Ny]; for(int j=0;j<Ny;j++) cell[i][j]=new cellStruct [Nz]; } for (i = 0; i< Nx; ++i) for (j = 0; j< Ny; ++j) for (k = 0; k< Nz; ++k) { // big algorithm that uses array like in string below cell[i][j][k+1].v1 = cell[i][j+1][k-1].v2 * cell[i+1][Ny-1][k+1].v5; }
1D array
#define cell(i,j,k) (cells[(i)*Ny*Nz + (j)*Ny + (k)]) cellStruct *cells = new cellStruct [Nx*Ny*Nz]; for (i = 1; i< Nx-1; ++i) for (j = 1; j< Ny-1; ++j) for (k = 1; k< Nz-1; ++k) { cell(i,j,k+1).v1 = cell(i,j+1,k-1).v2 * cell(i+1,Ny-1,k+1).v5; }
Program works more slowly in case 2. How else I can improve approach of working with big 3D array? Using float variables speed up calculations twice, but I want to have more accuracy. Maybe is better to use structure with pointers to variables inside like below?
struct cells
{
double ***v1;
// ...
double ***v15;
double ***v16;
double ***v17;
double ***v18;
};
well 500^3 is quite a size -> 125M cells
You can do only this:
1.rewrite computation to be more effective
2.use multithreading
3.pack the input data
Since you want to improve your cache efficiency, converting an array of structures to a structure of arrays will help you.
I am almost sure you will have to convert your triple-indirect pointers to 1-D arrays too, in order to make the struct-of-arrays idea effective.
struct cellStruct
{
double* v1; // you can use std::vector<double> instead of double*
// more variables here
double* v15;
double* v16;
double* v17;
double* v18;
};
Since your calculation only uses v1
, v2
and v5
, caching all other variables is best disabled. Using the struct-of-arrays layout allocates different memory regions for v1
, v2
, v3
, etc - so you don't force the cache to load these useless v3
, v4
, v6
, ...
Some syntax tweaks:
#define CELL_ACCESS(cells,vn,i,j,k) (cells.vn[(i)*Ny*Nz + (j)*Ny + (k)])
cellStruct cells;
cells.v1 = new double[Nx*Ny*Nz]; // if you use std::vector, adjust code accordingly
cells.v2 = new double[Nx*Ny*Nz];
...
for (i = 1; i< Nx-1; ++i)
for (j = 1; j< Ny-1; ++j)
for (k = 1; k< Nz-1; ++k)
{
CELL_ACCESS( cells, v1, i, j, k+1) =
CELL_ACCESS( cells, v2, i, j+1, k-1) *
CELL_ACCESS( cells, v5, i+1, Ny-1, k+1);
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.