FORTRAN: Best way to store large amount of data which is readable in MATLAB

Question

I am working on developing an application in Fortran where I have points defining quadrilateral panels on the surface of an object. I am calculating various parameters on these quadrilateral panels for a number of frequencies.

The output file should look like:

FREQUENCY,PANEL_NUMBER,X1,Y1,Z1,X2,Y2,Z2,X3,Y3,Z3,X4,Y4,Z4,AREA,PRESSURE,....
0.01,1,....
0.01,2,....
0.01,3,....
.
.
.
.
0.01,2000,....
0.02,1,....
0.02,2,....
.
.
.
0.02,2000,...
.
.

I am expecting a maximum of 300,000 rows with 30 columns. Data types are composed of integer, real and complex numbers. I want to store this file and later read the file in MATLAB to create a 3D geometry which I will color based on pressure at each panel.

The problem is, as you can see from the file structure, there is lot of data. I am currently writing this as a CSV file and the size is about 26GB.

I do not want to use database to handle this. Could anyone suggest what file format I should write this data using FORTRAN.

Thanks for your help, Amitava

Answer 1

Store the data in the native format of the computer rather than in a human-readable file in which the numbers have been converted to base 10 and characters. This will produce the smallest file and the fastest to process. On the Fortran open statement, use form='unformatted', access='stream' . The first causes the file to be unformatted, the second causes Fortran not to include its usual record-length information, which is Fortran specific. This omission makes the file more portable to other languages. Someone else can help better with how to read the file in MATLAB; I found this on the web: http://www.mathworks.com/help/matlab/import_export/importing-binary-data-with-low-level-io.html

UPDATE: This approach has several assumptions. It might not work easily if you wish to transport the file between different types of computers. Your question implies that want many rows of identical content. Identical rows simply matches a file structure with that number of identical records. It seems that you want to read the entire file, in which case a sequential file is appropriate. If you wish to read "random" records, a Fortran direct access file might be useful. With the simplicity of identical records, using a native file format seems easy. If you want self-documentation or portability across computers (different numeric representations), a file format such as HDF or FITS would be useful.

Answer 2

I second @steabert's mention of NetCDF and there's also HDF5 (on which the NetCDF 4 format is built). However, it does depend on what you mean by "data types": they are best used with regular/rigid data layouts and NetCDF's support for Fortran derived types can be painful at times.

Possible advantages for cases with large lumps are data transparent compression; data checksumming; and possibly more natural random access (that is, no need to compute seek positions based on array index) compared with Fortran stream access. That's on top of the usual things of a self-documenting and portable file format.

MATLAB has inbuilt support for reading these files, and recent versions also support the OPeNDAP framework so you wouldn't even need to have the file on the same (or multiple) machine(s).

Of course, disadvantages: extra software; extra skills development (especially for HDF5); and increased code complexity on the Fortran side.

FORTRAN: Best way to store large amount of data which is readable in MATLAB

Question

2 answers

solution1
3 ACCPTED 2014-01-11 06:15:46

solution2
2 2014-01-11 12:12:28

FORTRAN: Best way to store large amount of data which is readable in MATLAB

Question

2 answers

solution1 3 ACCPTED 2014-01-11 06:15:46

solution2 2 2014-01-11 12:12:28

solution1
3 ACCPTED 2014-01-11 06:15:46

solution2
2 2014-01-11 12:12:28