C++ save and load huge vector<bool>

Question

I have a huge vector<vector<bool>> (512x 44,000,000 bits). It takes me 4-5 hours to do the calculation for creating it and obviously I want to save results to spare me of repeating the process ever again. When I run the program again, all I want to do is load the same vector (no other app will use this file).

I believe text files are out of the question for such a great size. Is there a simple (quick and dirty) way to do this? I do not use Boost and this is only a minor part of my scientific app, so it must be something quick. I also thought of inversing it online and store it in a Postgres DB (44000000 records with a 512 bit data), so the DB can handle it easily. I have seen answers such take 8bits > 1byte and then save, but with my limited newbie C++ experience, they sound too complicated. Any ideas?

Answer 1

You can save 8 bits into a single byte:

unsigned char saver(bool bits[])
{
   unsigned char output=0;
   for(int i=0;i<8;i++)
   {

           output=output|(bits[i]<<i); //probably faster than if(){output|=(1<<i);}
           //example: for the starting array 00000000
           //first iteration sets:           00000001 only if bits[0] is true
           //second sets:                    0000001x only if bits[1] is true
           //third sets:                     000001xx only third is true
           //fifth:                          00000xxx if fifth is false
           // x is the value before

   }
   return output;
}

You can load 8 bits from a single byte:

void loader(unsigned char var, bool * bits)
{

   for(int i=0;i<8;i++)
   {

       bits[i] = var & (1 << i);
       // for example you loaded var as "200" which is 11001000 in binary
       // 11001000 --> zeroth iteration gets false
       // first gets false
       // second false
       // third gets true 
       //...
   }

}

1<<0 is 1  -----> 00000001
1<<1 is 2  -----> 00000010
1<<2 is 4  -----> 00000100
1<<3 is 8  -----> 00001000
1<<4 is 16  ----> 00010000
1<<5 is 32  ----> 00100000
1<<6 is 64  ----> 01000000
1<<7 is 128  ---> 10000000

Edit: Using gpgpu, an embarrassingly parallel algorithm taking 4-5 hours on cpu can be shortened to 0.04 - 0.05 hours on gpu(or even less than a minute with multiple gpus) For example, the upper "saver/loader" functions are embarrassingly parallel.

Answer 2

I have seen answers such take 8bits > 1byte and then save, but with my limited newbie C++ experience, they sound too complicated. Any ideas?

If you are going to read the file often, this would be a good time to learn bitwise operations. Using one bit per bool would be 1/8th the size. That's going to save a lot of memory and I/O.

So save it as one bit per bool, then either break it into chunks and/or read it using mapped memory (eg mmap ). You can put this behind a usable interface, so you need to implement it just once and abstract the serialized format when you need to read the values.

Answer 3

Process as said before, here vec is the vector of vector of bool and we pack all bit in sub vector 8 x 8 in bytes and push those a bytes in a vector.

 std::vector<unsigned char> buf;
 int cmp = 0;
 unsigned char output=0;
   FILE* of = fopen("out.bin")
  for_each ( auto& subvec in vec)
  {
       for_each ( auto b in subvec)
       {
            output=output | ((b ? 1 : 0) << cmp);
             cmp++;
            if(cmp==8)
             {
                 buf.push_back(output);
                 cmp = 0;
                 output = 0;
              }
          }
            fwrite(&buf[0], 1, buf.size(), of);
            buf.clear();
       }

         fclose(of);

C++ save and load huge vector<bool>

Question

3 answers

solution1
3 2013-07-13 15:32:46

solution2
2 2013-07-13 15:26:14

solution3
1 ACCPTED 2013-07-13 15:50:57

C++ save and load huge vector<bool>

Question

3 answers

solution1 3 2013-07-13 15:32:46

solution2 2 2013-07-13 15:26:14

solution3 1 ACCPTED 2013-07-13 15:50:57

solution1
3 2013-07-13 15:32:46

solution2
2 2013-07-13 15:26:14

solution3
1 ACCPTED 2013-07-13 15:50:57