independent parallel writing into files in C++ and MPI

Question

I have implemented a code in C++ and MPI that is supposed to do millions of computations and save millions of numbers in about 7 files for each CPU working on its data. And I am using about 10,000 cores which gives a total of 70,000 files with millions lines of codes to be written in parallel.

I used ofstream for the writing but for some reason the MPI code breaks at the middle and the files seems to be empty. I want each processor to write its 7 files independently than all the other processors and according to my search this could be done using MPI but I read about it in many resources and I can't understand how can it be used for independent writing and with specifying the file names dynamically during execution. If it is the correct way can somebody please explain it with as much details as possible? And if not please explain your other suggestion as much details as possible?

My current writing that doesn't work looks something like this:

if (rank == 0)
    {

    if(mkdir("Database",0777)==-1)//creating a directory
    {

    }
    rowsCount = fillCombinations(BCombinations,  RCombinations,
                                 BList,               RList,
                                 maxCombinations,        BIndexBegin, 
                                 BIndexEnd,           RIndexBegin, 
                                 RIndexEnd,    
                                 BCombinationsIndex,  RCombinationsIndex
                          );
}

//then broad cast all the arrays that will be used in all of the computations and at the root 
//send all the indexes to work on on the slaves then at the slave 

or (int cc = BeginIndex ; cc <= EndIndex; cc++)
        {


           // begin by specifying the values that will be used 
           // and making files for each B and R in the list


            BIndex      = betaCombinationsIndex   [cc];
            RIndex     = roughCombinationsIndex  [cc];



            //creating files to save data in and indicating the R and B by their index 
            //specifying files names

           std::string str1;
           std::ostringstream buffer1;
           buffer1 << "Database/";
           str1 = buffer1.str();

           //specifying file names

            std::ostringstream pFileName;
            std::string ppstr2;
            std::ostringstream ppbuffer2;
            ppbuffer2 <<"P_"<<"Beta_"<<(BIndex+1)<<"_Rho_"<<(RIndex+1)<<"_sampledP"<< ".txt";
            ppstr2 = ppbuffer2.str();
            pFileName <<str1.c_str()<<ppstr2.c_str();
            std::string p_file_name = pFileName.str();

            std::ostringstream eFileName;
            std::string eestr2;
            std::ostringstream eebuffer2;
            eebuffer2 <<"E_"<<"Beta_"<<(BIndex+1)<<"_Rho_"<<(RIndex+1)<<"_sampledE"<< ".txt";
            eestr2 = eebuffer2.str();
            eFileName <<str1.c_str()<< eestr2.c_str();
            std::string e_file_name = eFileName.str();

            // and so on for the 7 files .... 


            //creating the files
            ofstream pFile;
            ofstream eFile;

            // and so on for the 7 files .... 

            //opening the files
            pFile      .open (p_file_name.c_str());
            eFile        .open (e_file_name.c_str());

            // and so on for the 7 files .... 
            // then I start the writing in the files and at the end ...



            pFile.close();

            eFile.close();
}
// end of the segment loop

Answer 1

Standard C++/C libraries are not good enough to access that many files. Even BG/L/P kernel will collapse if you try to access hundreds of thousands of files at the same time, which is quite close to your number. Large number of physical files also stresses the parallel system with of extra metadata.

Sophisticated supercomputers generally have a large number of dedicated I/O nodes -- why don't you utilize the standard MPI functions for parallel I/O? That should be enough for the number of files you would like to save.

You can start here : http://www.open-mpi.org/doc/v1.4/man3/MPI_File_open.3.php

Good luck!

Answer 2

Do you need to do the IOs by yourself? If not, you could give a try to the HDF5 library which becomes quite popular among the scientists using HPC. It might be forth having a look at it, this might simplify your work. Eg you can write things in the same file and avoid having thousands of files. (Remark that your performances might also depend on the filesystem of your cluser)

Answer 3

Well create 7 threads or processes what ever you are using and append the threadid / processid to file being written. There should be no contention this way.

Answer 4

The Blue Gene architecture might only have a few years left, but the problem of how to do "scalable I/O" will remain with us for some time.

First, MPI-IO is essentially a requirement at this scale, particularly the collective I/O features. Even though this paper was written for /L, the lessons are still relevant:

collective open lets the library set up some optimizations
collective reads writes can be transformed into requests that line up nicely with GPFS file system block boundaries (which is important for lock management and minimizing overhead)
the selection and placement of "I/O aggregators" can be done in a way that's mindful of the machine's topology

https://press3.mcs.anl.gov/romio/2006/02/15/romio-on-blue-gene-l/

The selection of aggregators is pretty complicated on /Q but the idea is that these aggregators are selected to balance I/O over all the available "system call I/O forwarding" (ciod) links:

https://press3.mcs.anl.gov/romio/2015/05/15/aggregation-selection-on-blue-gene/

independent parallel writing into files in C++ and MPI

Question

4 answers

solution1
3 2013-03-01 21:14:56

solution2
2 2013-03-01 07:24:24

solution3
1 2013-03-01 06:30:49

solution4
1 2015-05-26 14:43:32

independent parallel writing into files in C++ and MPI

Question

4 answers

solution1 3 2013-03-01 21:14:56

solution2 2 2013-03-01 07:24:24

solution3 1 2013-03-01 06:30:49

solution4 1 2015-05-26 14:43:32

solution1
3 2013-03-01 21:14:56

solution2
2 2013-03-01 07:24:24

solution3
1 2013-03-01 06:30:49

solution4
1 2015-05-26 14:43:32