简体   繁体   English

有效地将“1”和“-1”数组写入/读取到二进制文件

[英]Efficiently writing/reading an array of '1' and '-1's to a binary file

I am a computational-physics graduate student and my research requires me to write a large array storing the values of '1' and '-1' to a binary file(s).我是一名计算物理学研究生,我的研究要求我编写一个大数组,将“1”和“-1”的值存储到二进制文件中。 Currently I have come up with the following MWE:目前我提出了以下MWE:

#include <fstream>
#include <sstream>
#include <bitset>

const int Num = 1024;

std::string int_array_to_string(int state[], int start, int finish){
    std::ostringstream oss("");
    for (int i=start; i<start+finish; i++)
        switch(state[i]){
            case -1: oss << 0; break;
            case  1: oss << 1; break;
        }
    return oss.str();
}
void printToBinary(int state[], std::ostream &output){
    for (int i=0; i<Num; i+=32){
        std::bitset<32> x( int_array_to_string(state, i, 32));
        unsigned long n = x.to_ulong();
        output.write(reinterpret_cast<const char*>(&n), sizeof(n));
    }
}
void fakeUpSomeData(int state[]){
    int ans = 1;
    for (int i=0; i<Num; i++){
        ans *= -1;
        state[i] = ans;
    }
}
int main(void){
    int state[Num] = {0};
    fakeUpSomeData(state);

    std::ofstream output("output.bin", std::ios::binary);

    printToBinary(state, output);

    return 0;
}

This however, makes my program run three times slower than before and I'm certain there must be a better way to do this.然而,这使我的程序运行速度比以前慢了三倍,我确信必须有更好的方法来做到这一点。

Additionally it would be useful to be able to register chunks of the data later, that is if I store the three states此外,能够稍后注册数据块也很有用,也就是说,如果我存储三个状态

{1,-1,1}
{1,-1,1}
{1,1,-1}

into one file it would be useful if a method exists to read the first chunk, then the second chunk, then the third chunk.如果存在一种方法可以读取第一个块,然后是第二个块,然后是第三个块,那么它会很有用。

A bit of background/reasoning behind why I need to do this: I will need to store roughly 1024*1e5 up to 9632*1e6 of these ints to calculate low/high resolution predictions for neutron scattering.为什么我需要这样做的一些背景/推理:我需要存储大约 1024*1e5 到 9632*1e6 这些整数来计算中子散射的低/高分辨率预测。 So being able to read out chunks of some size 'N' would be extremely useful instead of storing 1e6 separate binary files in a folder (just typing that option sounds ridiculous!).因此,能够读出一些大小为“N”的块将非常有用,而不是将 1e6 个单独的二进制文件存储在一个文件夹中(只是输入该选项听起来很荒谬!)。

Finally I have considered using the package HDF5 but it seems a bit overkill, and I was unable to get a MWE to work using it.最后,我考虑过使用 HDF5 包,但它似乎有点矫枉过正,而且我无法让 MWE 使用它。

Any thoughts on how to improve the MWE would be appreciated and thank you for your time.任何关于如何改进 MWE 的想法将不胜感激,并感谢您的时间。

Check out this answer: Writing a binary file in C++ very fast看看这个答案: 用 C++ 写一个二进制文件非常快

In summary, try using C Style I/O, that is forget about output streams and use open() and write() to write directly to the file descriptors.总之,尝试使用 C 风格的 I/O,即忘记输出流并使用 open() 和 write() 直接写入文件描述符。

You could even use read() with a buffer size the same number of bytes needed to store your NxN binary states in a single chunk andread them in one at a time.您甚至可以使用 read() 的缓冲区大小与将 NxN 二进制状态存储在单个块中所需的字节数相同,并一次读取一个。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM