简体   繁体   中英

Reading hdf5 into c++ with memory problems

I am rewriting a code I had developed in python into c++ mainly for an improvement in speed; while also hoping to gain more experience in this language. I also plan on using openMP to parallelize this code onto 48 cores which share 204GB of memory.

The program I am writing is simple, I import an hdf5 file which is 3D : A[T][X][E], where T is associated to each timestep from a simulation, X represents where the field is measured, and E(0:2) represents the electric field in x,y,z.
Each element in A is a double, and the bin sizes span: A[15000][80][3].

The first hiccup I have run into is inputting this 'large' h5 file into an array and would like a professional opinion before I continue. My first attempt:

...
#define RANK  3
#define DIM1  15001
#define DIM2  80
#define DIM3  3

using namespace std;
int main (void)
{
//  Define HDF5 variables for opening file. 
hid_t   file1, dataset1;
double bufnew[DIM1][DIM2][DIM3];
herr_t ret;
uint  i, j, k;

file1 = H5Fopen (FILE1, H5F_ACC_RDWR, H5P_DEFAULT);
dataset1 = H5Dopen (file1, "EFieldOnLine", H5P_DEFAULT);
ret = H5Dread (dataset1, H5T_NATIVE_DOUBLE, H5S_ALL, H5S_ALL,
                H5P_DEFAULT, bufnew);

cout << "Let's try dumping 0->100 elements" << endl;
for(i=1; i < 100; i++) cout << bufnew[i][20][2] << endl;
...

which leads to a segmentation fault from array declaration. My next move was to use either a 3D array (new), or a 3D vector. However, I have seen much debate against these methods, and more importantly, I only need ONE component of the E, ie I would like to reshape A[T][X][E] -> B[T][X] for say, the x-component of E.

Sorry for the lengthy post, but I wanted to be as clear as possible and would like to emphasize again that I am interested in learning how to write the fastest, and most efficient code. I appreciate all of your suggestions, time and wisdom.

Defining an array as a local variable means allocating it on stack. The stack is usually limited with several megabytes, and stack overflow surely leads to a segfault. Large data structures should be allocated at heap dynamically (using new operator) or statically (when defined as global variables).

I wouldn't advise to make a vector of vectors of vectors for such dimensions.

Instead, creating a one-dimensional array to store all values

double *bufnew = new double[DIM1*DIM2*DIM3];

and accessing it with the following formula to calculate linear position of a given 3D item

bufnew[(T*DIM2+X)*DIM3+E] = ... ; // bufnew[T][X][E]

should work ok.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM