简体   繁体   English

在霍夫曼压缩/解压缩中处理最后一个字节

[英]Handling last byte in huffman compression/decompression

I have a program that produces a Huffman tree based on ASCII character frequency read in a text input file. 我有一个程序可以根据在文本输入文件中读取的ASCII字符频率生成霍夫曼树。 The Huffman codes are stored in a string array of 256 elements, empty string if the character is not read. 霍夫曼码存储在由256个元素组成的字符串数组中;如果未读取字符,则为空字符串。 This program also then encodes and compresses an output file and then is able to take the compressed file as an input file and does decompression and decoding. 然后,该程序还对输出文件进行编码和压缩,然后能够将压缩后的文件作为输入文件进行解压缩和解码。

In summary, my program takes a input file compresses and encodes an output file, closes the output file and opens the encoding as an input file, and takes a new output file that is supposed to have a decoded message identical to the original text input file. 总而言之,我的程序将一个输入文件压缩并编码为一个输出文件,关闭该输出文件并以输入文件的形式打开编码,并获取一个新的输出文件,该文件应具有与原始文本输入文件相同的解码消息。 。

My current problem with this program: When decoding the compressed file I get an extra character or so that is not in the original input file decoded. 我目前与该程序有关的问题:解码压缩文件时,我得到一个额外的字符,或者该字符不在解码的原始输入文件中。 This is due to the trash bits from what I know. 这是由于我所知道的垃圾位。 With research I found one solution may be to use a psuedo-EOF character to stop decoding before the trash bits are read but I am not sure how to implement this in my current functions that handle encoding and decoding so all guidance and help is much appreciated. 通过研究,我发现一种解决方案可能是使用psuedo-EOF字符在读取垃圾位之前停止解码,但是我不确定如何在当前处理编码和解码的函数中实现此功能,因此非常感谢所有指导和帮助。

My end goal is to be able to use this program to also completely decode the encoded file without the trash bits sent to output file. 我的最终目标是能够使用此程序来完全解码编码的文件,而无需将垃圾位发送到输出文件。

Below I have two functions, encodedOutput and decodeOutput that handle the compression and decompression. 下面我有两个函数,encodedOutput和decodeOutput处理压缩和解压缩。

(For encodedOutput function, fileName is the input file parameter, fileName2 is the output file parameter) (对于encodedOutput函数,fileName是输入文件参数,fileName2是输出文件参数)

(For decodeOutput function, fileName2 is the input file parameter, fileName 3 is output file parameter) (对于decodeOutput函数,fileName2是输入文件参数,fileName 3是输出文件参数)

code[256] is a parameter for both of these functions and holds the Huffman code for each unique character read in the original input file, for example, the character 'H' being read in the input file may have a code of "111" stored in the code array for code[72] at the time it is being passed to the functions. code [256]是这两个函数的参数,并保存原始输入文件中读取的每个唯一字符的霍夫曼代码,例如,输入文件中读取的字符“ H”可能具有代码“ 111”在将代码[72]传递给函数时存储在代码数组中。

freq[256] holds the frequency of each ascii character read or holds 0 if it is not in original input file. freq [256]保存每个ASCII字符的读取频率,如果它不在原始输入文件中,则保持0。

void encodeOutput(const string & fileName, const string & fileName2, string code[256]) {
    ifstream ifile; //to read file
    ifile.open(fileName, ios::binary);
    if (!ifile)//to check if file is open or not
    {
        die("Can't read again"); // function that exits program if can't open
    }
    ofstream ofile;
    ofile.open(fileName2, ios::binary);
    if (!ofile) {
        die("Can't open encoding output file");
    }
    int read; 
    read = ifile.get(); //read one char from file and store it in int
    char buffer = 0, bit_count = 0;
    while (read != -1) {//run this loop until reached to end of file(-1)
        for (unsigned b = 0; b < code[read].size(); b++) { // loop through bits (code[read] outputs huffman code)
            buffer <<= 1;
            buffer |= code[read][b] != '0';
            bit_count++;
            if (bit_count == 8) {
                ofile << buffer;
                buffer = 0;
                bit_count = 0;
            }
        }
        read = ifile.get();
    }

    if (bit_count != 0)
        ofile << char(buffer << (8 - bit_count));

    ifile.close();
    ofile.close();
}

void decodeOutput(const string & fileName2, const string & fileName3, string code[256], const unsigned long long freq[256]) {
    ifstream ifile;
    ifile.open(fileName2, ios::binary);
    if (!ifile)
    {
        die("Can't read again");
    }
    ofstream ofile;
    ofile.open(fileName3, ios::binary);
    if (!ofile) {
        die("Can't open encoding output file");
    }
    priority_queue < node > q;
    for (unsigned i = 0; i < 256; i++) {
        if (freq[i] == 0) {
            code[i] = "";
        }
    }

    for (unsigned i = 0; i < 256; i++)
        if (freq[i])
            q.push(node(unsigned(i), freq[i]));

    if (q.size() < 1) {
        die("no data");
    }

    while (q.size() > 1) {
        node *child0 = new node(q.top());
        q.pop();
        node *child1 = new node(q.top());
        q.pop();
        q.push(node(child0, child1));
    } // created the tree
    string answer = "";
    const node * temp = &q.top(); // root 
    for (int c; (c = ifile.get()) != EOF;) {
        for (unsigned p = 8; p--;) { //reading 8 bits at a time 
            if ((c >> p & 1) == '0') { // if bit is a 0
                temp = temp->child0; // go left
            }
            else { // if bit is a 1
                temp = temp->child1; // go right
            }
            if (temp->child0 == NULL && temp->child1 == NULL) // leaf node
            {
                answer += temp->value;
                temp = &q.top();
            }
        }
    }
  ofile << ans;
}

Change it to freq[257] and code[257] , and set freq[256] to one. 将其更改为freq[257]code[257] ,并将freq[256]设置为1。 Your EOF is symbol 256, and it will appear once in the stream, at the end. 您的EOF是符号256,它将在流中最后出现一次。 At the end of your encoding, send symbol 256. When you receive symbol 256 while decoding, stop. 编码结束时,发送符号256。解码时收到符号256时,请停止。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM