简体   繁体   English

如何在 C 中将二进制位写入二进制文件?

[英]How can I write binary bits into binary file in C?

I am trying to implement Huffman encoding in C. I am done with the tree construction and obtained the codeword for each symbol as the algorithm proceeds.我正在尝试在 C 中实现霍夫曼编码。我完成了树的构造,并随着算法的进行获得了每个符号的代码字。 But now I am stuck with insertion of the codewords into binary files for the corresponding symbol.但是现在我坚持将代码字插入相应符号的二进制文件中。 Can someone suggest how the codeword or binary bits can be written into binary file so that i can obtain the compressed file.有人可以建议如何将代码字或二进制位写入二进制文件,以便我可以获得压缩文件。

The codewords are of variable length.码字是可变长度的。

A function to write and read these bits to/from the file would be helpful.将这些位写入文件或从文件读取这些位的函数会很有帮助。

This is the code I have written这是我写的代码

void create_compressed_file()
{
    char str[20], ch, *str2, str1[10], str_arr[6], str3[10];
    FILE *fp, *fp2, *fp3;
    int i, array[20], j = 0;
    fp2 = fopen("newfile.txt", "r"); // contains the original text file
    fp3 = fopen("codeword.txt", "r"); // contains the symbol and codeword
    while (fscanf(fp2, "%s", &str) == 1) {
        rewind(fp3);
        str2 = strtok(str, "-");
        while (str2 != NULL) {
            strcpy(str_arr, str2);
            printf("str2= %s ", str_arr); //str2 stores the symbol(not char but a string)
            printf(" %s-", str2);
            while (fscanf(fp3, "%s", &str1) == 1) {
                if (strcmp(str1, str_arr) == 0) {
                    fscanf(fp3, "%s", &str1); // extracted corresponding codeword(1s and 0s) of   the symbol and stored it into str1
                    printf("%s\n", str1);
                    write_codeword_to_binaryfile(); // function that i want to create with is   incomplete and need your help.
                }
            }
            str2 = strtok(NULL, "-");
            rewind(fp3);
        }
        printf("\nspace:");
        strcpy(str_arr, "space");
        while (fscanf(fp3, "%s", &str1) == 1) {
            if (strcmp(str1, str_arr) == 0) {
                fscanf(fp3, "\n%s", &str1); // extract the codeword for(space)character  
                printf("%s\n", str1);
            }
        }
    }
    fclose(fp2);
    fclose(fp3);    
}

codeword.txt :码字.txt

is  0000
por 00010
Plain   000110
most    0001110
the 0001111
ted 00100 
text    00101
ly  0011000
near    0011001
pli 0011010
ap  0011011
ble 0011100
ta  0011101
by  0011110
sup 0011111
cryp    0100000
In  0100001
ra  0100010
tog 0100011
ting    0100100
tain    0100101
mands   0100110
com 0100111
mes 0101000
to  0101001
ge  0101010
sa  0101011
plain   0101100
phy 0101101

I tried the above code as below but it didnt write anything... The file size after execution was 0 bytes:我尝试了上面的代码如下,但它没有写任何东西......执行后的文件大小为0字节:

#include <stdio.h>
#include <conio.h>
#include <stdint.h>

void write_codeword_to_binaryfile(
    const char *codeword, // codeword to write, in ASCII format
    FILE *file,           // destination file
    uint8_t *buffer,
    int *fullness)
{
    char c;
    //  fullness = ;
    *buffer = 0;
    for (c = *codeword++; c != '\0'; c = *codeword++) // iterate
    {
        int bit = c - '0'; // convert from ASCII to binary 0/1
        *buffer |= bit << (7 - fullness);
        ++fullness;
    }
    fputc(*buffer, file);
}

int main() {
    FILE *fp;
    uint8_t *buffer = 0;
    char *c = "10101010";
    char b = 0;
    int i;
    fp = fopen("myfile.bin", "wb");
    write_codeword_to_binaryfile(c, fp, buffer, 8);
    fclose(fp);
    getch();
}

First of all, you should open your file in binary mode:首先,您应该以二进制模式打开文件:

fp = fopen("myfile", "wb"); // "b" means "binary"

This is a must in Windows, but not necessary on most other platforms (you don't need to do anything special to differentiate the platform; just use "wb").这在 Windows 中是必须的,但在大多数其他平台上不是必需的(您不需要做任何特殊的事情来区分平台;只需使用“wb”)。

To write bits into the file, you should use a buffer - a partially-filled byte.要将位写入文件,您应该使用缓冲区 - 部分填充的字节。 Write the buffer to the file when it fills up (contains exactly 8 filled bits).当缓冲区填满时将缓冲区写入文件(正好包含 8 个填充位)。

uint8_t buffer = 0;

You should use a counter that tracks how many bits are filled.您应该使用一个计数器来跟踪填充了多少位。

int fullness = 0;

Your function, which writes to a file, should receive the buffer and its fullness.您写入文件的函数应该接收缓冲区及其填充度。 Since it will change them, you actually have to send pointers:因为它会改变它们,你实际上必须发送指针:

void write_codeword_to_binaryfile(
    const char *codeword, // codeword to write, in ASCII format
    FILE *file,           // destination file
    uint8_t *buffer,
    int *fullness)
{
    for (char c = *codeword++; c != '\0'; c = *codeword++) // iterate
    {
        int bit = c - '0'; // convert from ASCII to binary 0/1
        ...
    }
}

There are two ways to arrange bits in a byte - little-endian (first bit is the least-significant bit) or big-endian (first bit is the most-significant bit).有两种方法可以在字节中排列位 - little-endian(第一位是最低有效位)或 big-endian(第一位是最高有效位)。 The customary way is to use big-endian ordering.习惯的方法是使用大端排序。

So if your buffer has a certain number of bits filled, how to fill the next bit?因此,如果您的缓冲区填充了一定数量的位,如何填充下一位? The following example shows a buffer with 5 bits filled:以下示例显示了一个填充了 5 位的缓冲区:

011011...
      ^
next bit to fill (its position, starting from the left, is 2)

As you can see from this example, the position of the next bit is 7 - fullness .从这个例子可以看出,下一位的位置是7 - fullness So, for each bit, do the following:因此,对于每一位,请执行以下操作:

*buffer |= bit << (7 - *fullness);
++fullness;

See How do you set, clear and toggle a single bit in C/C++?请参阅如何在 C/C++ 中设置、清除和切换单个位? for more info.了解更多信息。

When the buffer is full ( fullness is equal to 8), write it to file:当缓冲区已满( fullness等于 8)时,将其写入文件:

fputc(*buffer, file);
*fullness = 0;
*buffer = 0;

You should also "flush" the buffer (ie write it to file) when finished encoding your message:完成对消息的编码后,您还应该“刷新”缓冲区(即将其写入文件):

if (*fullness > 0)
    fputc(*buffer, file);

By the way, what happens at the end of the message is a common non-trivial problem for bit-level encoders.顺便说一句,消息末尾发生的事情对于位级编码器来说是一个常见的非平凡问题。 You should think about it from the point of view of the decoder: you need to understand how many bits you should decode in the last byte of the file.您应该从解码器的角度考虑:您需要了解在文件的最后一个字节中应该解码多少位。 There are several solutions for this:有几种解决方案:

  • After encoding your message, encode an additional 1 bit, and then zero bits until the buffer is full.对消息进行编码后,再编码1位,然后再编码 0 位,直到缓冲区已满。 The decoder will need to decode the zero bits and the 1 bit in reverse.解码器将需要反向解码零位和1位。 This is used by MPEG.这是 MPEG 使用的。
  • Write the length of the message, in bits, in the file's header.在文件头中写入消息的长度(以位为单位)。 This is probably the simplest solution, although it requires updating the file's beginning after finishing the encoding.这可能是最简单的解决方案,尽管它需要在完成编码后更新文件的开头。
  • Have a special codeword for "end of message" (also often used)有一个特殊的“消息结束”代码字(也经常使用)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM