C++ storing 0 and 1 more efficiently, like in a binary file?

Question

I want to store multiple arrays which all entries consist of either 0 or 1. This file would be quite large if i do it the way i do it. I made a minimalist version of what i currently do.

#include <iostream>
#include <fstream>
using namespace std;

int main(){
    ofstream File;
    File.open("test.csv");
    int array[4]={1,0,0,1};
    for(int i = 0; i < 4; ++i){
        File << array[i] << endl;   
    }
    File.close();
    return 0;
}

So basically is there a way of storing this in a binary file or something, since my data is 0 or 1 in the first place anyways? If yes, how to do this? Can i also still have line-breaks and maybe even commas in that file? If either of the latter does not work, that's also fine. Just more importantly, how to store this as a binary file which has only 0 and 1 so my file is smaller. Thank you very much!

Answer 1

So basically is there a way of storing this in a binary file or something, since my data is 0 or 1 in the first place anyways? If yes, how to do this? Can i also still have line-breaks and maybe even commas in that file? If either of the latter does not work, that's also fine. Just more importantly, how to store this as a binary file which has only 0 and 1 so my file is smaller.

The obvious solution is to take 64 characters, say AZ, az, 0-9, and + and /, and have each character code for six entries in your table. There is, in fact, a standard for this called Base64 . In Base64, A encodes 0,0,0,0,0,0 while / encodes 1,1,1,1,1,1. Each combination of six zeroes or ones has a corresponding character.

This still leaves commas, spaces, and newlines free for your use as separators.

Answer 2

If you want to store the data as compactly as possible, I'd recommend storing it as binary data, where each bit in the binary file represents one boolean value. This will allow you to store 8 boolean values for each byte of disk space you use up.

If you want to store arrays whose lengths are not multiples of 8, it gets a little bit more complicated since you can't store a partial byte, but you can solve that problem by storing an extra byte of meta-data at the end of the file that specifies how many bits of the final data-byte are valid and how many are just padding.

Something like this:

#include <iostream>
#include <fstream>
#include <cstdint>
#include <vector>

using namespace std;

// Given an array of ints that are either 1 or 0, returns a packed-array
// of uint8_t's containing those bits as compactly as possible.
vector<uint8_t> packBits(const int * array, size_t arraySize)
{
   const size_t vectorSize = ((arraySize+7)/8)+1;  // round up, then +1 for the metadata byte

   vector<uint8_t> packedBits;
   packedBits.resize(vectorSize, 0);

   // Store 8 boolean-bits into each byte of (packedBits)
   for (size_t i=0; i<arraySize; i++)
   {
      if (array[i] != 0) packedBits[i/8] |= (1<<(i%8));
   }

   // The last byte in the array is special; it holds the number of
   // valid bits that we stored to the byte just before it.
   // That way if the number of bits we saved isn't an even multiple of 8,
   // we can use this value later on to calculate exactly how many bits we should restore
   packedBits[vectorSize-1] = arraySize%8;
   return packedBits;
}

// Given a packed-bits vector (i.e. as previously returned by packBits()),
// returns the vector-of-integers that was passed to the packBits() call.
vector<int> unpackBits(const vector<uint8_t> & packedBits)
{
   vector<int> ret;
   if (packedBits.size() < 2) return ret;

   const size_t validBitsInLastByte = packedBits[packedBits.size()-1]%8;
   const size_t numValidBits        = 8*(packedBits.size()-((validBitsInLastByte>0)?2:1)) + validBitsInLastByte;

   ret.resize(numValidBits);
   for (size_t i=0; i<numValidBits; i++)
   {
      ret[i] = (packedBits[i/8] & (1<<(i%8))) ? 1 : 0;
   }
   return ret;
}

// Returns the size of the specified file in bytes, or -1 on failure
static ssize_t getFileSize(ifstream & inFile)
{
   if (inFile.is_open() == false) return -1;

   const streampos origPos = inFile.tellg();  // record current seek-position
   inFile.seekg(0, ios::end);  // seek to the end of the file
   const ssize_t fileSize = inFile.tellg();   // record current seek-position
   inFile.seekg(origPos);  // so we won't change the file's read-position as a side effect
   return fileSize;
}

int main(){

    // Example of packing an array-of-ints into packed-bits form and saving it
    // to a binary file
    {
       const int array[]={0,0,1,1,1,1,1,0,1,0};

       // Pack the int-array into packed-bits format
       const vector<uint8_t> packedBits = packBits(array, sizeof(array)/sizeof(array[0]));

       // Write the packed-bits to a binary file
       ofstream outFile;
       outFile.open("test.bin", ios::binary);
       outFile.write(reinterpret_cast<const char *>(&packedBits[0]), packedBits.size());
       outFile.close();
    }

    // Now we'll read the binary file back in, unpack the bits to a vector<int>,
    // and print out the contents of the vector.
    {
       // open the file for reading
       ifstream inFile;
       inFile.open("test.bin", ios::binary);

       const ssize_t fileSizeBytes = getFileSize(inFile);
       if (fileSizeBytes < 0)
       {
          cerr << "Couldn't read test.bin, aborting" << endl;
          return 10;
       }

       // Read in the packed-binary data
       vector<uint8_t> packedBits;
       packedBits.resize(fileSizeBytes);
       inFile.read(reinterpret_cast<char *>(&packedBits[0]), fileSizeBytes);

       // Expand the packed-binary data back out to one-int-per-boolean
       vector<int> unpackedInts = unpackBits(packedBits);

       // Print out the int-array's contents
       cout << "Loaded-from-disk unpackedInts vector is " << unpackedInts.size() << " items long:" << endl;
       for (size_t i=0; i<unpackedInts.size(); i++) cout << unpackedInts[i] << "  ";
       cout << endl;
    }

    return 0;
}

(You could probably make the file even more compact than that by running zip or gzip on the file after you write it out:) )

Answer 3

You can indeed write and read binary data. However having line breaks and commas would be difficult. Imagine you save your data as boolean data, so only ones and zeros. Then having a comma would mean you need an special character, but you have only ones and zeros., The next best thing would be to make an object of two booleans, one meaning the usual data you need (c++ would then read the data in pairs of bits), and the other meaning whether you have a comma or not. but I doubt this is what you need, If you want to do something like a csv, then it would be easy to just fix the size of each column (int would be 4 bytes, a string of no more than 32 char for example). and then just read and write accordingly. Suppose you have your binary

To initially save your array of the an object say pets, then you would use

FILE *apFile;
apFile = fopen(FILENAME,"w+");
fwrite(ARRAY_OF_PETS, sizeof(Pet),SIZE_OF_ARRAY, apFile);
fclose(apFile);

To access your idx pet, you would use

Pet m;
ifstream input_file (FILENAME, ios::in|ios::binary|ios::ate);
input_file.seekg (sizeof(Pet) * idx, ios::beg);
input_file.read((char*) &m,sizeof(Pet));
input_file.close();

You can also add data add the end, change data in the middle and so on.

C++ storing 0 and 1 more efficiently, like in a binary file?

Question

3 answers

solution1
3 2021-05-16 02:25:48

solution2
1 2021-05-16 02:58:46

solution3
0 2021-05-16 02:22:05

C++ storing 0 and 1 more efficiently, like in a binary file?

Question

3 answers

solution1 3 2021-05-16 02:25:48

solution2 1 2021-05-16 02:58:46

solution3 0 2021-05-16 02:22:05

solution1
3 2021-05-16 02:25:48

solution2
1 2021-05-16 02:58:46

solution3
0 2021-05-16 02:22:05