简体   繁体   中英

Writing MPI result to a file

I have some code which solves an all-pars shortest path problem and each processor has a piece of the result. I am trying to write this result, which is a martix to an output file. So each process, which has part of the solution, will write the result to an output file in the correct position. Now i am trying to use fseek for this but am a little stuck because of the different sized integers. Like 2 and -199 will have to take more space. How can I do it so that the processors do not overwrite eachother? Also there might be race conditions for the writing.

Should i do this another way or is there a way to accomplish this? I was thinking of sending all result to one process (rank 0) and have that create the array and write the the file.

Don't use ASCII output; use binary, which is well defined in size.

So if you're using fstream and doubles:

fstream filewriter("file.bin",ios::out | ios::binary);

vector<double> mylist;
mylist.push_back(2.5);
mylist.push_back(7.6);
mylist.push_back(2.1);
mylist.push_back(3.2);
mylist.push_back(4.2);

filewriter.write((char*)&mylist[0],mylist.size()*sizeof(double));

This will write exactly 40 bytes, which is the size of double (8) times the size of your list (5 elements). And using fseek will be very easy.

In scientific environment when having a huge output it's extremely recommended to use binary data. However:

1- You have to learn about the concept of endianness (big endian, little endian). 2- You have to document your work proporly for reuse (purpose, size, number of element, dimensionality). I face huge misunderstandings when I forget to document stuff (I'm a PhD physicist who programs simulations).

So ASCII for data analysis is not the right choice.

Luckily, there's a full library specialized in organizing stuff for you, called HDF5. It organizes endianness and portability for you; however, it's not easy to deal with it, and it has a steep learning curve. I think that's a harder story for later times.

What I would recommend, is that you learn how to deal with binary files and how to read them, understand their issues and problems. I think that you're professional enough to deal with binary files, since you use MPI.

Here's a quick tutorial to binary files:

http://courses.cs.vt.edu/cs2604/fall02/binio.html

Cheers.

You could have each process write the output in some format that can be merged and cleaned up after the last one is done. Like (x, y, z), (x, y, z)...where x is the index of the row, y is the column and z the value.

This is a good job for memory-mapped files. They are system-dependent, but they're implemented in both POSIX and Windows OS families, so if you use a modern OS, they'd work. There is a portable and C++-friendly implementation of them in boost (classes mapped_file_source, mapped_file_sink and mapped_file). Interprocess output is a classical example of their usage.

They are binary, so most of that Samer said in his answer applies, too, the only difference is that you use pointer arithmetic instead of seeking.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM