简体   繁体   中英

Converting a byte into bit and writing the binary data to file

Suppose I have a character array, char a[8] containing 10101010. If I store this data in a .txt file, this file has 8 bytes size. Now I am asking that how can I convert this data to binary format and save it in a file as 8 bits (and not 8 bytes) so that the file size is only 1 byte.

Also, Once I convert these 8 bytes to a single byte, Which File format should I save the output in? .txt or .dat or .bin?

I am working on Huffman Encoding of text files. I have already converted the text format into binary, ie 0's and 1's, but when I store this output data on a file, each digit(1 or 0) takes a byte instead of a bit. I want a solution such that each digit takes only a bit.

char buf[100];
void build_code(node n, char *s, int len)
{
static char *out = buf;
if (n->c) {
    s[len] = 0;
    strcpy(out, s);
    code[n->c] = out;
    out += len + 1;
    return;
}

s[len] = '0'; build_code(n->left,  s, len + 1);
s[len] = '1'; build_code(n->right, s, len + 1);
}

This is how I build up my code tree with help of a Huffman tree. And

void encode(const char *s, char *out)
{
while (*s) 
   {
    strcpy(out, code[*s]);
    out += strlen(code[*s++]);
    }
}

This is how I Encode to get the final output.

Not entirely sure how you end up with a string representing the binary representation of a value, but you can get an integer value from a string (in any base) using standard functions like std::strtoul .

That function provides an unsigned long value, since you know your value is within 0-255 range you can store it in an unsigned char:

unsigned char v=(unsigned char)(std::strtoul(binary_string_value.c_str(),0,2) & 0xff);

Writing it to disk, you can use ofstream to write

Which File format should I save the output in? .txt or .dat or .bin?

Keep in mind that the extension (the .txt, .dat or .bin) does not really mandate the format (ie the structure of the contents). The extension is a convention commonly used to indicate that you're using some well-known format (and in some OS/environments, it drives the configuration of which program can best handle that file). Since this is your file, it is up to you define the actual format... and to name the file with any extension (or even no extension) you like best (or in other words, any extension that best represent your contents) as long as it is meaningful to you and to those that are going to consume your files.

Edit: additional details

Assuming we have a buffer of some length where you're storing your string of '0' and '1'

int codeSize; // size of the code buffer
char *code;   // code array/pointer
std::ofstream file; // File stream where we're writing to.


unsigned char *byteArray=new unsigned char[codeSize/8+(codeSize%8+=0)?1:0]
int bytes=0;
for(int i=8;i<codeSize;i+=8) {
    std::string binstring(code[i-8],8); // create a temp string from the slice of the code
    byteArray[bytes++]=(unsigned char)(std::strtoul(binstring.c_str(),0,2) & 0xff);
}

if(i>codeSize) {
    // At this point, if there's a number of bits not multiple of 8, 
    // there are some bits that have not
    // been writter. Not sure how you would like to handle it. 
    // One option is to assume that bits with 0 up to 
    // the next multiple of 8... but  it all depends on what you're representing.
}

file.write(byteArray,bytes); 

Function converting input 8 chars representing bit representation into one byte.

char BitsToByte( const char in[8] )
{
    char ret = 0;
    for( int i=0, pow=128;
         i<8;
         ++i, pow/=2;
        )
        if( in[i] == '1' ) ret += pow;
    return ret;
}

We iterate over array passed to function (of size 8 for obvious reasons) and based of content of it we increase our return value (first element in the array represents the oldest bit). pow is set to 128 because 2^(n-1) is value of n-th bit.

One way:

/** Converts 8 bytes to 8 bits **/
unsigned char BinStrToNum(const char a[8])
   {
   return(  ('1' == a[0]) ? 128 : 0
          + ('1' == a[1]) ? 64  : 0
          + ('1' == a[2]) ? 32  : 0
          + ('1' == a[3]) ? 16  : 0
          + ('1' == a[4]) ? 8   : 0
          + ('1' == a[5]) ? 4   : 0
          + ('1' == a[6]) ? 2   : 0
          + ('1' == a[7]) ? 1   : 0);
          );
   };

Save it in any of the formats you mentioned; or invent your own!

int main()
   {
   rCode=0;
   char *a = "10101010";
   unsigned char byte;
   FILE *fp=NULL;

   fp=fopen("data.xyz", "wb");
   if(NULL==fp)
      {
      rCode=errno;
      fprintf(stderr, "fopen() failed. errno:%d\n", errno);
      goto CLEANUP;
      }

   byte=BinStrToNum(a);
   fwrite(&byte, 1, 1, fp);

CLEANUP:

   if(fp)
      fclose(fp);

   return(rCode);
   }

You can shift them into a byte pretty easily, like this:

byte x = (s[3] - '0') + ((s[2] - '0') << 1) + ((s[1] - '0') << 2) + ((s[0] - '0') << 3);

In my example, I only shifted a nibble, or 4-bits. You can expand the example to shift an entire byte. This solution will be faster than a loop.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM