简体   繁体   中英

write std::bitset to binary file and load the file to std:bitset

I am working with a project which will use bitset. As the text file provided is very large(>800M), to load it directly to std::bitset will cost more then 25 seconds. So I want to preprocess the text file to a memory dumped binary file. Because a char with 8 bit will covert to 1 bit, the cost time of file load will reduce much. I write a demo code:

#include <iostream>      
#include <bitset>         
#include <string>
#include <stdexcept>      
#include <fstream>
#include <math.h> 

int main () {
    const int MAX_SIZE = 19;
    try {

        std::string line = "1001111010011101011";
        int copy_bypes = (int)ceil((float)MAX_SIZE / 8.0);


        std::bitset<MAX_SIZE>* foo = new (std::nothrow)std::bitset<MAX_SIZE>(line);     // foo: 0000
        std::ofstream os ("data.dat", std::ios::binary);
        os.write((const char*)&foo, copy_bypes);
        os.close();


        std::bitset<MAX_SIZE>* foo2 = new (std::nothrow)std::bitset<MAX_SIZE>();
        std::ifstream input("data.dat",std::ios::binary);
        input.read((char*)&foo2, copy_bypes);
        input.close();

        for (int i = foo2->size() -1 ; i >=0 ; --i) {
            std::cout  << (*foo2)[i];
        }
        std::cout <<std::endl;
    }
    catch (const std::invalid_argument& ia) {
        std::cerr << "Invalid argument: " << ia.what() << '\n';
    }
    return 0;
}

it seems work fine, but I am worried this usage can really work fine in production enviroment.

Thanks in some advanced.

Writing binary non-trival class to file is really dangerous. You should convert bitset to well-defined binary data. If you know that your data will fit in unsigned long long, you could use bitset<>::to_ullong() and write/read that unsigned long long. If you wanna this to be cross platform beetwet eg 64 and 32 bit platform, you should use fixed size types.

These two lines are wrong

os.write((const char*)&foo, copy_bypes);
input.read((char*)&foo2, copy_bypes);

You're passing the address of pointer to foo2 , not the std::bitset object itself. But even if it's corrected:

os.write((const char*)foo, copy_bypes);
input.read((char*)foo2, copy_bypes);

it would be unsafe to use in production environment. Here you're assuming that std::bitset is a PODtype and access it as such. However, when your code would become more complex, you're risking of writing or reading too much, and there're no safeguards to stop undefined behavior from happening. std::bitset was made to be convenient, not fast, and it is expressed through the methods it provides to access bits - there's no proper way of obtaining the address of its storage, as, for example, std::vector or std::string provide. If you need performance, you'll need to do your own implementation.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM