简体   繁体   English

C++ 更有效地存储 0 和 1,就像在二进制文件中一样?

[英]C++ storing 0 and 1 more efficiently, like in a binary file?

I want to store multiple arrays which all entries consist of either 0 or 1. This file would be quite large if i do it the way i do it.我想存储多个 arrays ,所有条目都由 0 或 1 组成。如果我按照我的方式进行操作,这个文件会非常大。 I made a minimalist version of what i currently do.我做了一个我目前所做的极简版本。

#include <iostream>
#include <fstream>
using namespace std;

int main(){
    ofstream File;
    File.open("test.csv");
    int array[4]={1,0,0,1};
    for(int i = 0; i < 4; ++i){
        File << array[i] << endl;   
    }
    File.close();
    return 0;
}

So basically is there a way of storing this in a binary file or something, since my data is 0 or 1 in the first place anyways?所以基本上有没有办法将它存储在二进制文件或其他东西中,因为我的数据首先是 0 或 1? If yes, how to do this?如果是,该怎么做? Can i also still have line-breaks and maybe even commas in that file?我还可以在该文件中还有换行符甚至逗号吗? If either of the latter does not work, that's also fine.如果后者中的任何一个都不起作用,那也没关系。 Just more importantly, how to store this as a binary file which has only 0 and 1 so my file is smaller.更重要的是,如何将其存储为只有 0 和 1 的二进制文件,所以我的文件更小。 Thank you very much!非常感谢!

So basically is there a way of storing this in a binary file or something, since my data is 0 or 1 in the first place anyways?所以基本上有没有办法将它存储在二进制文件或其他东西中,因为我的数据首先是 0 或 1? If yes, how to do this?如果是,该怎么做? Can i also still have line-breaks and maybe even commas in that file?我还可以在该文件中还有换行符甚至逗号吗? If either of the latter does not work, that's also fine.如果后者中的任何一个都不起作用,那也没关系。 Just more importantly, how to store this as a binary file which has only 0 and 1 so my file is smaller.更重要的是,如何将其存储为只有 0 和 1 的二进制文件,所以我的文件更小。

The obvious solution is to take 64 characters, say AZ, az, 0-9, and + and /, and have each character code for six entries in your table.显而易见的解决方案是使用 64 个字符,例如 AZ、az、0-9 和 + 和 /,并在表中为六个条目设置每个字符代码。 There is, in fact, a standard for this called Base64 .事实上,有一个称为Base64的标准。 In Base64, A encodes 0,0,0,0,0,0 while / encodes 1,1,1,1,1,1.在 Base64 中, A编码 0,0,0,0,0,0,而/编码 1,1,1,1,1,1。 Each combination of six zeroes or ones has a corresponding character.六个零或一的每个组合都有一个对应的字符。

This still leaves commas, spaces, and newlines free for your use as separators.这仍然会留下逗号、空格和换行符,供您用作分隔符。

If you want to store the data as compactly as possible, I'd recommend storing it as binary data, where each bit in the binary file represents one boolean value.如果您想尽可能紧凑地存储数据,我建议将其存储为二进制数据,其中二进制文件中的每一位代表一个 boolean 值。 This will allow you to store 8 boolean values for each byte of disk space you use up.这将允许您为您用完的每个字节的磁盘空间存储 8 个 boolean 值。

If you want to store arrays whose lengths are not multiples of 8, it gets a little bit more complicated since you can't store a partial byte, but you can solve that problem by storing an extra byte of meta-data at the end of the file that specifies how many bits of the final data-byte are valid and how many are just padding.如果你想存储长度不是 8 的倍数的 arrays,它会变得有点复杂,因为你不能存储部分字节,但是你可以通过在末尾存储一个额外的元数据字节来解决这个问题指定最终数据字节有多少位有效以及有多少位只是填充的文件。

Something like this:像这样的东西:

#include <iostream>
#include <fstream>
#include <cstdint>
#include <vector>

using namespace std;

// Given an array of ints that are either 1 or 0, returns a packed-array
// of uint8_t's containing those bits as compactly as possible.
vector<uint8_t> packBits(const int * array, size_t arraySize)
{
   const size_t vectorSize = ((arraySize+7)/8)+1;  // round up, then +1 for the metadata byte

   vector<uint8_t> packedBits;
   packedBits.resize(vectorSize, 0);

   // Store 8 boolean-bits into each byte of (packedBits)
   for (size_t i=0; i<arraySize; i++)
   {
      if (array[i] != 0) packedBits[i/8] |= (1<<(i%8));
   }

   // The last byte in the array is special; it holds the number of
   // valid bits that we stored to the byte just before it.
   // That way if the number of bits we saved isn't an even multiple of 8,
   // we can use this value later on to calculate exactly how many bits we should restore
   packedBits[vectorSize-1] = arraySize%8;
   return packedBits;
}

// Given a packed-bits vector (i.e. as previously returned by packBits()),
// returns the vector-of-integers that was passed to the packBits() call.
vector<int> unpackBits(const vector<uint8_t> & packedBits)
{
   vector<int> ret;
   if (packedBits.size() < 2) return ret;

   const size_t validBitsInLastByte = packedBits[packedBits.size()-1]%8;
   const size_t numValidBits        = 8*(packedBits.size()-((validBitsInLastByte>0)?2:1)) + validBitsInLastByte;

   ret.resize(numValidBits);
   for (size_t i=0; i<numValidBits; i++)
   {
      ret[i] = (packedBits[i/8] & (1<<(i%8))) ? 1 : 0;
   }
   return ret;
}

// Returns the size of the specified file in bytes, or -1 on failure
static ssize_t getFileSize(ifstream & inFile)
{
   if (inFile.is_open() == false) return -1;

   const streampos origPos = inFile.tellg();  // record current seek-position
   inFile.seekg(0, ios::end);  // seek to the end of the file
   const ssize_t fileSize = inFile.tellg();   // record current seek-position
   inFile.seekg(origPos);  // so we won't change the file's read-position as a side effect
   return fileSize;
}

int main(){

    // Example of packing an array-of-ints into packed-bits form and saving it
    // to a binary file
    {
       const int array[]={0,0,1,1,1,1,1,0,1,0};

       // Pack the int-array into packed-bits format
       const vector<uint8_t> packedBits = packBits(array, sizeof(array)/sizeof(array[0]));

       // Write the packed-bits to a binary file
       ofstream outFile;
       outFile.open("test.bin", ios::binary);
       outFile.write(reinterpret_cast<const char *>(&packedBits[0]), packedBits.size());
       outFile.close();
    }

    // Now we'll read the binary file back in, unpack the bits to a vector<int>,
    // and print out the contents of the vector.
    {
       // open the file for reading
       ifstream inFile;
       inFile.open("test.bin", ios::binary);

       const ssize_t fileSizeBytes = getFileSize(inFile);
       if (fileSizeBytes < 0)
       {
          cerr << "Couldn't read test.bin, aborting" << endl;
          return 10;
       }

       // Read in the packed-binary data
       vector<uint8_t> packedBits;
       packedBits.resize(fileSizeBytes);
       inFile.read(reinterpret_cast<char *>(&packedBits[0]), fileSizeBytes);

       // Expand the packed-binary data back out to one-int-per-boolean
       vector<int> unpackedInts = unpackBits(packedBits);

       // Print out the int-array's contents
       cout << "Loaded-from-disk unpackedInts vector is " << unpackedInts.size() << " items long:" << endl;
       for (size_t i=0; i<unpackedInts.size(); i++) cout << unpackedInts[i] << "  ";
       cout << endl;
    }

    return 0;
}

(You could probably make the file even more compact than that by running zip or gzip on the file after you write it out:) ) (您可以通过在文件上运行zipgzip来使文件更紧凑:))

You can indeed write and read binary data.您确实可以写入和读取二进制数据。 However having line breaks and commas would be difficult.然而,有换行符和逗号会很困难。 Imagine you save your data as boolean data, so only ones and zeros.想象一下,您将数据保存为 boolean 数据,因此只有 1 和 0。 Then having a comma would mean you need an special character, but you have only ones and zeros., The next best thing would be to make an object of two booleans, one meaning the usual data you need (c++ would then read the data in pairs of bits), and the other meaning whether you have a comma or not.然后有一个逗号意味着你需要一个特殊字符,但你只有一个和零。,下一个最好的事情是制作两个布尔值的 object,一个意味着你需要的常用数据(c++ 然后会读取数据位对),另一个意思是你是否有逗号。 but I doubt this is what you need, If you want to do something like a csv, then it would be easy to just fix the size of each column (int would be 4 bytes, a string of no more than 32 char for example).但我怀疑这是你需要的,如果你想做 csv 之类的事情,那么只需固定每列的大小就很容易(int 是 4 个字节,例如不超过 32 个字符的字符串) . and then just read and write accordingly.然后相应地读写。 Suppose you have your binary假设你有你的二进制文件

To initially save your array of the an object say pets, then you would use要最初保存您的 object 数组说宠物,那么您将使用

FILE *apFile;
apFile = fopen(FILENAME,"w+");
fwrite(ARRAY_OF_PETS, sizeof(Pet),SIZE_OF_ARRAY, apFile);
fclose(apFile);

To access your idx pet, you would use要访问您的idx宠物,您可以使用

Pet m;
ifstream input_file (FILENAME, ios::in|ios::binary|ios::ate);
input_file.seekg (sizeof(Pet) * idx, ios::beg);
input_file.read((char*) &m,sizeof(Pet));
input_file.close();

You can also add data add the end, change data in the middle and so on.还可以在末尾添加数据,在中间更改数据等。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM