简体   繁体   中英

Huffman Decoding Compressed File

I have a program that produces a Huffman tree based on ASCII character frequency read in a text input file. The Huffman codes are stored in a string array of 256 elements, empty string if the character is not read. This program also encodes and compresses an output file.

I am now trying to decompress and decode my current output file which is opened as an input file and a new output file is to have the decoded message identical to the original text input file.

My thought process for this part of my assignment is to work backwards from the encoding function I have made and read 8 bits at a time and somehow decode the message by updating a variable (string n) which is an empty string at first, through recursion of the Huffman tree until I get a code to output to output file.

I have currently started the function but I am stuck and I am looking for some guidance in writing my current decodeOutput function. All help is appreciated.
My completed encodedOutput function and decodeOutput function is down below:

(For encodedOutput function, fileName is the input file parameter, fileName2 is the output file parameter)

(For decodeOutput function, fileName2 is the input file parameter, fileName 3 is output file parameter)

code[256] is a parameter for both of these functions and holds the Huffman code for each unique character read in the original input file, for example, the character 'H' being read in the input file may have a code of "111" stored in the code array for code[72] at the time it is being passed to the functions.

void encodeOutput(const string & fileName, const string & fileName2, string code[256]) {
    ifstream ifile;//to read file
    ifile.open(fileName, ios::binary);
    if (!ifile) //to check if file is open or not
    {
        die("Can't read again");
    }
    ofstream ofile;
    ofile.open(fileName2, ios::binary);
    if (!ofile) {
        die("Can't open encoding output file");
    }
    int read;
    read = ifile.get();//read one char from file and store it in int
    char buffer = 0, bit_count = 0;
    while (read != -1) {
        for (unsigned b = 0; b < code[read].size(); b++) { // loop through bits (code[read] outputs huffman code)
            buffer <<= 1;
            buffer |= code[read][b] != '0'; 
            bit_count++;
            if (bit_count == 8) {
                ofile << buffer;
                buffer = 0;
                bit_count = 0;
            }
        }
        read = ifile.get();
    }

    if (bit_count != 0)
        ofile << (buffer << (8 - bit_count));

    ifile.close();
    ofile.close();
}

//Work in progress
void decodeOutput(const string & fileName2, const string & fileName3, string code[256]) {
    ifstream ifile;
    ifile.open(fileName2, ios::binary);
    if (!ifile)
    {
        die("Can't read again");
    }
    ofstream ofile;
    ofile.open(fileName3, ios::binary);
    if (!ofile) {
        die("Can't open encoding output file");
    }
    string n = ""; 
    for (int c; (c = ifile.get()) != EOF;) {
        for (unsigned p = 8; p--;) {
            if ((c >> p & 1) == '0') { // if bit is a 0

            }
            else if ((c >> p & 1) == '1') { // if bit is a 1

            }
            else { // Output string n (decoded character) to output file
              ofile << n;
            }
        }
    }
}

The decoding would be easier if you had the original Hoffman tree used to construct the codebook. But suppose you only have the codebook (ie, the string code[256] ) but not the original Hoffman tree. What you can do is the following:

  • Partition the codebook into groups of codewords with different lengths. Say the codebook consists of codewords with n different lengths: L 0 < L 1 < ... < L n-1 .
  • Read (but do not consume yet) k bits from input file, with k increasing From L 0 up to L n-1 , until you find a match between the input k bits and a codeword of length k = L i for some i.
  • Output the 8-bit character corresponding to the matching codeword, and consume the k bits from input file.
  • Repeat until all bits from input file are consumed.

If the codebook were constructed correctly, and you always look up the codewords in increasing length, you should never find a sequence of input bits which you cannot find a matching codeword.

Effectively, in terms of the Hoffman tree equivalence, every time you compare k input bits with a group of codewords of length k, you are checking whether a leaf at tree level-k contains an input-matching codeword; every time you increase k to the next longer group of codewords, you are walking down the tree to a higher level (say level-0 is the root).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM