[英]Read a 2-D array in a parallel way using C
I am working on a physical simulation code (C program) with intensive I/O.我正在开发一个具有密集 I/O 的物理模拟代码(C 程序)。 At each time step, I need to load a 2-D array from a binary file on the disk followed by processing it.
在每个时间步,我需要从磁盘上的二进制文件加载一个二维数组,然后对其进行处理。 To load the array, I use fseek to move the pointer and fread to actually read the data.
为了加载数组,我使用 fseek 移动指针并使用 fread 来实际读取数据。 However, this I/O process significantly slows down the program, especially when dealing with large models.
但是,此 I/O 过程会显着减慢程序的速度,尤其是在处理大型模型时。
So I am thinking about using OpenMP to speed up.所以我正在考虑使用 OpenMP 来加速。 Basically, I read the binary file row by row using fseek.
基本上,我使用 fseek 逐行读取二进制文件。
#pragma omp parallel for private(ix, Fp)
for (ix = 0; ix < nx; ix++) {
fseek(Fp, sizeof(float) * (nx * nz * (it - 2) + ix * nz), SEEK_SET); // Move the pointer
fread(array[ix], sizeof(float), nz, Fp); // Read array
}
The code works fine without the #pragma line but it gives me a segmentation fault when I include this line.该代码在没有#pragma 行的情况下工作正常,但是当我包含此行时它给了我一个分段错误。 So any idea how to fix it?
那么知道如何解决它吗? Or more generally, what is the fastest way to read a 2-D (even multidimensional) arrays from a binary file (probably parallel)?
或者更一般地说,从二进制文件(可能是并行的)读取二维(甚至多维)arrays 的最快方法是什么? Any suggestions would be helpful.
任何的意见都将会有帮助。 Thank you in advance.
先感谢您。
Consider using mmap() or mmap64() to make the whole file an array in memory.考虑使用 mmap() 或 mmap64() 使整个文件成为 memory 中的数组。 No buffered FILE*, no fseek(), just a pointer and careful pointer arithmetic.
没有缓冲的 FILE*,没有 fseek(),只有一个指针和仔细的指针运算。 You can overwrite data, too, if you configure that and if that helps.
如果您进行了配置并且有帮助,您也可以覆盖数据。 This uses all RAM as a cache of the file, and VM to read/write it, even if your code aborts.
这使用所有 RAM 作为文件的缓存,并使用 VM 读取/写入它,即使您的代码中止。 Other processes can also look at the file with no overhead, using mmap() or any kind of file I/O, It's one of the most powerful library routines, Of course.
其他进程也可以无开销地查看文件,使用 mmap() 或任何类型的文件 I/O,这是最强大的库例程之一,当然。 if the data is written in string form or the wrong endian order, there is extra overhead.
如果数据以字符串形式或错误的 endian 顺序写入,则会产生额外的开销。 There are also mmap() options to copy on write!
还有 mmap() 选项可以在写入时复制!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.