简体   繁体   中英

How can I print a bit instead of byte in a file?

I am using huffman algorithm to develop a file compressor and right now I am facing a problem which is:

By using the algorithm to the word: stackoverflow, i get the following result:

a,c,e,f,k,l,r,s,t,v,w = 1 time repeated
o = 2 times repeated

a,c,e,f,k,l,r,s,t,v,w = 7.69231%
and
o = 15.3846%

So I start inserting then into a Binary Tree, which will get me the results:

o=00
a=010
e=0110
c=0111
t=1000
s=1001
w=1010
v=1011
k=1100
f=1101
r=1110
l=1111

which means the path for the character in the tree, considering 0 to be left and 1 to right.

then the word "stackoverflow" will be: 100110000100111010011111000010110110111011011111001010

and well, I want to put that whole value into a binary file to be in bits, which will result in 47bits, which would happen to be 6bytes, but instead I can only make it 47bytes because the minimun to put into a file with fwrite or fprintf is 1byte, by using sizeof(something).

Than my question is: how can I print in my file only a single bit?

Just write the "header" to the file: the number of bits and then "pack" the bits into bytes padding the last one. Here's a sample.

#include <stdio.h>

FILE* f;

/* how many bits in current byte */
int bit_counter;
/* current byte */
unsigned char cur_byte;

/* write 1 or 0 bit */
void write_bit(unsigned char bit)
{
    if(++bit_counter == 8)
    {
        fwrite(&cur_byte,1,1,f);
        bit_counter = 0;
        cur_byte = 0;
    }

    cur_byte <<= 1;
    cur_byte |= bit;
}

int main()
{
    f = fopen("test.bits", "w");

    cur_byte = 0;
    bit_counter = 0;

    /* write the number of bits here to decode the bitstream later (47 in your case) */
    /* int num = 47; */           
    /* fwrite(num, 1, 4, f); */

    write_bit(1);
    write_bit(0);
    write_bit(0);
    /* etc...  - do this in a loop for each encoded character */
    /* 100110000100111010011111000010110110111011011111001010 */

    if(bit_counter > 0)
    {
         // pad the last byte with zeroes
         cur_byte <<= 8 - bit_counter;
         fwrite(&cur_byte, 1, 1, f);
    }

    fclose(f);

    return 0;
}

To do the full Huffman encoder you'll have to write the bit codes at the beginning, of course.

This is sort of an encoding issue. The problem is that files can only contain bytes - so 1 and 0 can only be '1' and '0' in a file - the characters for 1 and 0, which are bytes.

What you'll have to do is to pack the bits into bytes, creating a file that contains the bits in a set of bytes. You won't be able to open the file in a text editor - it doesn't know you want to display each bit as a 1 or 0 char , it will display whatever each packed byte turns out to be. You could open it with an editor that understands how to work with binary files, though. For instance, vim can do this.

As far as extra trailing bytes or an end-of-file marker, you're going to have to create some sort of encoding convention. For example, you can pack and pad with extra zeros, like you mention in your comments, but then by convention have the first N bytes be metadata - eg the data length, how many bits are interesting in your file. This sort of thing is very common.

You'll need to manage this yourself, by buffering the bits to write and only actually writing data when you have a complete byte. Something like...

 void writeBit(bool b)
 {
   static char buffer=0;
   static int bitcount=0;

   buffer = (buffer << 1) | (b ? 1:0);

   if (++bitcount == 8)
   {
     fputc(buffer); // write out the byte
     bitcount = 0;
     buffer = 0;
   }
 } 

The above isn't reentrant (and is likely to be pretty inefficient) - and you need to make sure you somehow flush any half-written byte at the end, (write an extra 7 zero bits, maybe) but you should get the general idea.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM