简体   繁体   English

C ++保存并加载巨大的向量<bool>

[英]C++ save and load huge vector<bool>

I have a huge vector<vector<bool>> (512x 44,000,000 bits). 我有一个巨大的vector<vector<bool>> (512x 44,000,000位)。 It takes me 4-5 hours to do the calculation for creating it and obviously I want to save results to spare me of repeating the process ever again. 创建它需要花费4-5个小时来进行计算,显然我想保存结果,以免我再次重复该过程。 When I run the program again, all I want to do is load the same vector (no other app will use this file). 当我再次运行该程序时,我要做的就是加载相同的向量(没有其他应用程序将使用此文件)。

I believe text files are out of the question for such a great size. 我相信文本文件对于这么大的尺寸是不可能的。 Is there a simple (quick and dirty) way to do this? 有没有简单(快速又脏)的方法来做到这一点? I do not use Boost and this is only a minor part of my scientific app, so it must be something quick. 我不使用Boost,这只是我的科学应用程序的一小部分,因此必须快速。 I also thought of inversing it online and store it in a Postgres DB (44000000 records with a 512 bit data), so the DB can handle it easily. 我还考虑过将其在线反转并存储在Postgres DB中(44000000条记录和512位数据),因此DB可以轻松处理它。 I have seen answers such take 8bits > 1byte and then save, but with my limited newbie C++ experience, they sound too complicated. 我看到这样的答案需要8bits> 1byte然后保存,但是由于我有限的新手C ++经验,它们听起来太复杂了。 Any ideas? 有任何想法吗?

You can save 8 bits into a single byte: 您可以 8位保存为一个字节:

unsigned char saver(bool bits[])
{
   unsigned char output=0;
   for(int i=0;i<8;i++)
   {

           output=output|(bits[i]<<i); //probably faster than if(){output|=(1<<i);}
           //example: for the starting array 00000000
           //first iteration sets:           00000001 only if bits[0] is true
           //second sets:                    0000001x only if bits[1] is true
           //third sets:                     000001xx only third is true
           //fifth:                          00000xxx if fifth is false
           // x is the value before

   }
   return output;
}

You can load 8 bits from a single byte: 您可以从单个字节加载 8位:

void loader(unsigned char var, bool * bits)
{

   for(int i=0;i<8;i++)
   {

       bits[i] = var & (1 << i);
       // for example you loaded var as "200" which is 11001000 in binary
       // 11001000 --> zeroth iteration gets false
       // first gets false
       // second false
       // third gets true 
       //...
   }

}

1<<0 is 1  -----> 00000001
1<<1 is 2  -----> 00000010
1<<2 is 4  -----> 00000100
1<<3 is 8  -----> 00001000
1<<4 is 16  ----> 00010000
1<<5 is 32  ----> 00100000
1<<6 is 64  ----> 01000000
1<<7 is 128  ---> 10000000

Edit: Using gpgpu, an embarrassingly parallel algorithm taking 4-5 hours on cpu can be shortened to 0.04 - 0.05 hours on gpu(or even less than a minute with multiple gpus) For example, the upper "saver/loader" functions are embarrassingly parallel. 编辑:使用gpgpu,在cpu上花费4-5个小时的令人尴尬的并行算法可以缩短为在gpu上的0.04-0.05个小时(甚至在使用多个gpu的情况下甚至不到一分钟)。例如,上方的“ saver / loader”功能令人尴尬平行。

I have seen answers such take 8bits > 1byte and then save, but with my limited newbie C++ experience, they sound too complicated. 我看到这样的答案需要8bits> 1byte然后保存,但是由于我有限的新手C ++经验,它们听起来太复杂了。 Any ideas? 有任何想法吗?

If you are going to read the file often, this would be a good time to learn bitwise operations. 如果您要经常读取文件,那么这是学习按位操作的好时机。 Using one bit per bool would be 1/8th the size. 每布尔使用一位将是大小的1/8。 That's going to save a lot of memory and I/O. 这样可以节省大量内存和I / O。

So save it as one bit per bool, then either break it into chunks and/or read it using mapped memory (eg mmap ). 因此,将其保存为每bool一位,然后将其分成大块和/或使用映射的内存(例如mmap )读取。 You can put this behind a usable interface, so you need to implement it just once and abstract the serialized format when you need to read the values. 您可以将其放在一个可用的接口后面,因此您只需执行一次即可,并在需要读取值时抽象化序列化的格式。

Process as said before, here vec is the vector of vector of bool and we pack all bit in sub vector 8 x 8 in bytes and push those a bytes in a vector. 如前所述,vec是bool向量的向量,我们将8 x 8子向量中的所有位打包成字节,然后将这些字节推入向量中。

 std::vector<unsigned char> buf;
 int cmp = 0;
 unsigned char output=0;
   FILE* of = fopen("out.bin")
  for_each ( auto& subvec in vec)
  {
       for_each ( auto b in subvec)
       {
            output=output | ((b ? 1 : 0) << cmp);
             cmp++;
            if(cmp==8)
             {
                 buf.push_back(output);
                 cmp = 0;
                 output = 0;
              }
          }
            fwrite(&buf[0], 1, buf.size(), of);
            buf.clear();
       }

         fclose(of);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM