简体   繁体   English

如何用c ++编写二进制文件

[英]How to Write a binary file in c++

I'm trying to implement the Huffman's encoding algorithm in c++. 我正在尝试用c ++实现Huffman的编码算法。

my question is : after i got the equivalent binary string for each character , how can i write those zeros and ones as binary on a file not as string 0 or string 1 ? 我的问题是:在我得到每个字符的等效二进制字符串后,如何将这些0和1作为二进制写入文件而不是字符串0或字符串1?

thanks in advance ... 提前致谢 ...

Obtaining individually the encoding of each character in a different data structure is a broken solution, because you need to juxtapose the encoding of each character in the resulting binary file: storing them individually makes that as hard as directly storing them contiguously in a vector of bits . 单独获取不同数据结构中每个字符的编码是一个破碎的解决方案,因为你需要在生成的二进制文件中并置每个字符的编码:单独存储它们就像在比特向量中连续存储它们一样难。

This consideration suggests using a std::vector<bool> to perform your task, but it is a broken solution because it can't be treated as a c-style array, and you really need that at output time. 这种考虑建议使用std::vector<bool>来执行你的任务,但它是一个破碎的解决方案,因为它不能被视为一个c风格的数组,你真的需要在输出时。

This question asks precisely which are the valid alternatives to std::vector<bool> , so I think answers to that question fits perfectly your question. 这个问题确切地询问哪些是std::vector<bool>的有效替代品,所以我认为这个问题的答案非常适合你的问题。

BTW, what I would do is to just wrap a std::vector<uint8_t> under a class which suits yout needs, like the code attached: 顺便说一句,我要做的就是将std::vector<uint8_t>包装在一个适合你需要的类下面,比如附带的代码:

#include <iostream>
#include <vector>
#include <cstdint>
#include <algorithm>
class bitstream {
private:
    std::vector<std::uint8_t> storage;
    unsigned int bits_used:3;
    void alloc_space();
public:
    bitstream() : bits_used(0) { }

    void push_bit(bool bit);

    template <typename T>
    void push(T t);

    std::uint8_t *get_array();

    size_t size() const;

    // beware: no reference!
    bool operator[](size_t pos) const;
};

void bitstream::alloc_space()
{
    if (bits_used == 0) {
        std::uint8_t push = 0;
        storage.push_back(push);
    }
}

void bitstream::push_bit(bool bit)
{
    alloc_space();
    storage.back() |= bit << 7 - bits_used++;
}

template <typename T>
void bitstream::push(T t)
{
    std::uint8_t *t_byte = reinterpret_cast<std::uint8_t*>(&t);
    for (size_t i = 0; i < sizeof(t); i++) {
        uint8_t byte = t_byte[i];
        if (bits_used > 0) {
            storage.back() |= byte >> bits_used;
            std::uint8_t to_push = (byte & ((1 << (8 - bits_used)) - 1)) << bits_used;
            storage.push_back(to_push);
        } else {
            storage.push_back(byte);
        }
    }
}

std::uint8_t *bitstream::get_array()
{
    return &storage.front();
}

size_t bitstream::size() const
{
    const unsigned int m = 0;
    return std::max(m, (storage.size() - 1) * 8 + bits_used);
}

bool bitstream::operator[](size_t size) const
{
    // No range checking
    return static_cast<bool>((storage[size / 8] >> 7 - (size % 8)) & 0x1);
}

int main(int argc, char **argv)
{
    bitstream bs;
    bs.push_bit(true);
    std::cout << bs[0] << std::endl;
    bs.push_bit(false);
    std::cout << bs[0] << "," << bs[1] << std::endl;
    bs.push_bit(true);
    bs.push_bit(true);
    std::uint8_t to_push = 0xF0;
    bs.push_byte(to_push);
    for (size_t i = 0; i < bs.size(); i++)
        std::cout << bs[i] << ",";
    std::cout << std::endl;
}

I hope this code can help you. 我希望这段代码可以帮到你。

  • You start from a sequence of bytes (1s and 0s) representing the continuous encoding of every character of the input file. 您从一个字节序列(1和0)开始,表示输入文件的每个字符的连续编码。
  • You take every byte of the sequence and add a bit into a temporary byte ( char byte ) 您获取序列的每个字节并将一个位添加到临时字节( char byte
  • Every time you fill a byte, you write it to file (you could also wait, for efficiency, to have a bigger data) 每次填充一个字节时,都要将其写入文件(为了提高效率,还可以等待,以获得更大的数据)
  • At the end, you write the remaining bits to file, filled with trailing zeros, for example 最后,将剩余的位写入文件,例如填充尾随零
  • As akappa correctly pointed out, the else branch can be removed if byte is set to 0 after each file writing operation (or, more generically, every time it has been totally filled and flushed somewhere else), so only 1s must be written. 作为akappa正确地指出的那样, else如果可以去除分支byte被设置为0的每个文件的写入操作(或者,更一般地,每次它已被完全填补,冲洗别处时),所以只有经过1s必须写。

void writeBinary(char *huffmanEncoding, int sequenceLength)
{
    char byte = 0;
    // For each bit of the sequence
    for (int i = 0; i  < sequenceLength; i++) {
        char bit = huffmanEncoding[i];

        // Add a single bit to byte
        if (bit == 1) {
            // MSB of the sequence to msb of the file
            byte |= (1 << (7 - (i % 8)));
            // equivalent form: byte |= (1 << (-(i + 1) % 8);
        }
        else {
            // MSB of the sequence to msb of the file
            byte &= ~(1 << (7 - (i % 8)));
            // equivalent form: byte &= ~(1 << (-(i + 1) % 8);
        }

        if ((i % 8) == 0 && i > 0) {
            //writeByteToFile(byte);
        }
    }

    // Fill the last incomplete byte, if any, and write to file
}

You cant write to a binary file with only bits; 你不能写只有位的二进制文件; the smallest size of data written is one byte (thus 8 bits). 写入的最小数据大小是一个字节(因此是8位)。

So what you should do is create a buffer (any size). 所以你应该做的是创建一个缓冲区(任何大小)。

char BitBuffer;

Writing to a buffer: 写入缓冲区:

int Location;
bool Value;

if (Value)
    BitBuffer |= (1 << Location);
else
    BitBuffer &= ~(1 << Location)

The code (1 << Location) generates a number with all 0's except the position specified by Location . 代码(1 << Location)生成一个全0的数字,但Location指定的Location除外。 Then, if Value is set to true, it sets corresponding bit in Buffer to 1, and to 0 in other case. 然后,如果Value设置为true,则将Buffer中的相应位设置为1,而将其他情况设置为0。 The binary operations used are fairly simple, if you don't understand them, it should be in any good C++ book/tutorial. 使用的二进制操作非常简单,如果你不理解它们,它应该在任何好的C ++书籍/教程中。

Location should be number in range <0, sizeof(Buffer)-1>, so <0,7> in this case. 位置应该是范围<0,sizeof(缓冲区)-1>的数字,在这种情况下是<0,7>。

Writing buffer to a file is relatively simple when using fstream. 使用fstream时,将缓冲区写入文件相对简单。 Just remember to open it as binary. 只记得把它打开成二进制文件。

ofstream File;
File.open("file.txt", ios::out | ios::binary);
File.write(BitBuffer, sizeof(char))

EDIT: Noticed a bug and fixed it. 编辑:注意到一个错误并修复它。

EDIT2: You can't use << operators in binary mode, i forgot about it. EDIT2:你不能在二进制模式下使用<<运算符,我忘了它。

Alternative solution : Use std::vector<bool> or std::bitset as a buffer. 替代解决方案 :使用std::vector<bool>std::bitset作为缓冲区。

This should be even simpler, but I thought I could help you a little bit more. 这应该更简单,但我想我可以帮助你一点点。

void WriteData (std::vector<bool> const& data, std::ofstream& str)
{
    char Buffer;
    for (unsigned int i = 0; i < data.size(); ++i)
    {
       if (i % 8 == 0 && i != 0)
           str.write(Buffer, 1);
       else
           // Paste buffer setting code here
           // Location = i/8;
           // Value = data[i];
    }
    // It might happen that data.size() % 8 != 0. You should fill the buffer
    // with trailing zeros and write it individually.
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM