简体   繁体   中英

C++ save and load huge vector<bool>

I have a huge vector<vector<bool>> (512x 44,000,000 bits). It takes me 4-5 hours to do the calculation for creating it and obviously I want to save results to spare me of repeating the process ever again. When I run the program again, all I want to do is load the same vector (no other app will use this file).

I believe text files are out of the question for such a great size. Is there a simple (quick and dirty) way to do this? I do not use Boost and this is only a minor part of my scientific app, so it must be something quick. I also thought of inversing it online and store it in a Postgres DB (44000000 records with a 512 bit data), so the DB can handle it easily. I have seen answers such take 8bits > 1byte and then save, but with my limited newbie C++ experience, they sound too complicated. Any ideas?

You can save 8 bits into a single byte:

unsigned char saver(bool bits[])
{
   unsigned char output=0;
   for(int i=0;i<8;i++)
   {

           output=output|(bits[i]<<i); //probably faster than if(){output|=(1<<i);}
           //example: for the starting array 00000000
           //first iteration sets:           00000001 only if bits[0] is true
           //second sets:                    0000001x only if bits[1] is true
           //third sets:                     000001xx only third is true
           //fifth:                          00000xxx if fifth is false
           // x is the value before

   }
   return output;
}

You can load 8 bits from a single byte:

void loader(unsigned char var, bool * bits)
{

   for(int i=0;i<8;i++)
   {

       bits[i] = var & (1 << i);
       // for example you loaded var as "200" which is 11001000 in binary
       // 11001000 --> zeroth iteration gets false
       // first gets false
       // second false
       // third gets true 
       //...
   }

}

1<<0 is 1  -----> 00000001
1<<1 is 2  -----> 00000010
1<<2 is 4  -----> 00000100
1<<3 is 8  -----> 00001000
1<<4 is 16  ----> 00010000
1<<5 is 32  ----> 00100000
1<<6 is 64  ----> 01000000
1<<7 is 128  ---> 10000000

Edit: Using gpgpu, an embarrassingly parallel algorithm taking 4-5 hours on cpu can be shortened to 0.04 - 0.05 hours on gpu(or even less than a minute with multiple gpus) For example, the upper "saver/loader" functions are embarrassingly parallel.

I have seen answers such take 8bits > 1byte and then save, but with my limited newbie C++ experience, they sound too complicated. Any ideas?

If you are going to read the file often, this would be a good time to learn bitwise operations. Using one bit per bool would be 1/8th the size. That's going to save a lot of memory and I/O.

So save it as one bit per bool, then either break it into chunks and/or read it using mapped memory (eg mmap ). You can put this behind a usable interface, so you need to implement it just once and abstract the serialized format when you need to read the values.

Process as said before, here vec is the vector of vector of bool and we pack all bit in sub vector 8 x 8 in bytes and push those a bytes in a vector.

 std::vector<unsigned char> buf;
 int cmp = 0;
 unsigned char output=0;
   FILE* of = fopen("out.bin")
  for_each ( auto& subvec in vec)
  {
       for_each ( auto b in subvec)
       {
            output=output | ((b ? 1 : 0) << cmp);
             cmp++;
            if(cmp==8)
             {
                 buf.push_back(output);
                 cmp = 0;
                 output = 0;
              }
          }
            fwrite(&buf[0], 1, buf.size(), of);
            buf.clear();
       }

         fclose(of);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM