简体   繁体   中英

Decoding a Huffman Tree from a File

I'm trying to decode a Huffman code.

I got a dictionary of chars, with an int value and a binary value of it and the binary values of the words, and it looks like this:

10,000;121,001;13,010;33,011;100,1000;32,1001;104,1010;101,1011;111,1100;108,1101;119,1110;114,1111;10101011001100111101100111111011000011011010000

... where the numbers like 10 - 121 -13 -33 and others are int values of a character, next to them are the binary value of the char, and then the sequence of 1 and 0 are the code message.

After I read it from a file txt, I split it in strings array so I can have a hashmap with the char as a key and the binary value as value.

Then I save it in an array of nodes so I can take them easily, the problem is this:

When I try to convert the binary message to char using the dictionary, I get a message like this:

1!y1y111!y11111!

When it should be this:

hey world!!

This is the method I'm using:

void decompress() throws HuffmanException, IOException {
    File file = FilesManager.chooseUncompressedFile();
    if (file == null) {
        throw new HuffmanException("No file");
    }
    FileReader read = new FileReader(file);
    BufferedReader buff = new BufferedReader(read);
    String auxText;
    StringBuilder compressFromFile = new StringBuilder();
    do {
        auxText = buff.readLine();
        if (auxText != null) {
            compressFromFile.append(auxText);
        }
    } while (auxText != null);
    String[] auxSplit1 = compressFromFile.toString().split(" ");
    String rest1 = auxSplit1[1];
    String[] auxSplit2 = rest1.split(";");
    System.out.println(auxSplit2[2]);
    HashMap<Integer, String> map = new HashMap<>();
    String[] tomapAux;
    for (int i = 0; i < auxSplit2.length - 2; i++) {
        tomapAux = auxSplit2[i].split(",");

        map.put(Integer.valueOf(tomapAux[0]), tomapAux[1]);
    }
    ArrayList<CharCode> charCodeArrayList = new ArrayList<>();

    map.forEach((k, v) -> charCodeArrayList.add(new CharCode((char) k.intValue(), v)));

    charCodeArrayList.sort(new Comparator<CharCode>() {
        @Override
        public int compare(CharCode o1, CharCode o2) {
            return extractInt(o1.getCode()) - extractInt(o2.getCode());
        }

        int extractInt(String s) {
            String num = s.replaceAll("\\D", "");
            return num.isEmpty() ? 0 : Integer.parseInt(num);
        }
    });

    for (int i = 0; i < charCodeArrayList.size(); i++) {
        System.out.println("Pos " + i + " char: " + charCodeArrayList.get(i).getChr() + " code: " + charCodeArrayList.get(i).getCode());
    }
    String st = auxSplit2[auxSplit2.length - 1];
    System.out.println("before: " + st);
    String newChar = String.valueOf(charCodeArrayList.get(0).getChr());
    String oldChar = charCodeArrayList.get(0).getCode();
    for (CharCode aCharCodeArrayList : charCodeArrayList) {
        st = st.replace(oldChar, newChar);
        newChar = String.valueOf(aCharCodeArrayList.getChr());
        oldChar = aCharCodeArrayList.getCode();
    }
    System.out.println("after : " +st);

}

And this is the class CharCode :

public class CharCode implements Comparable<CharCode> {
private char chr;
private String code;

public CharCode(char chr, String code) {
    this.chr = chr;
    this.code = code;
}

public char getChr() {
    return chr;
}

public String getCode() {
    return code;
}

@Override
public int compareTo(CharCode cc) {
    return ((int) this.chr) - ((int) cc.getChr());
}

}

And this is what I see in the console:

这就是我在控制台中看到的

So if anyone can help me on improving my method so I can get a hey world!! and not 1!y1y111!y11111! !!01 1!y1y111!y11111! !!01 , that would be great!

The problem with your program is that you're decoding in a wrong way: you take the first Huffman code, replace all of its occurences in a given string, then you do the same with the next Huffman code, and so on.

That's not the way of decoding Huffman-encoded string. In order to decode a Huffman-encoded string, you need to check if the PREFIX of the string is the same with some Huffman code. This is done by comparing the prefix of the string with Huffman codes one by one.

In your case:
iteration 1: 10101011001100111101100111111011000011011010000
we check 000 - not a prefix
we check 001 - not a prefix
we check 010 - not a prefix
we check 011 - not a prefix
we check 1000 - not a prefix
we check 1001 - not a prefix
we check 1010 - found a prefix! and it corresponds to letter h

Now we remove this prefix from the original string and so our string is
1011001100111101100111111011000011011010000

iteration 2: 1011001100111101100111111011000011011010000
suitable prefix is 1011 which is letter e

iteration 3: 001100111101100111111011000011011010000
suitable prefix is 001 which is letter y

iteration 4: 100111101100111111011000011011010000
suitable prefix is 1001 which is space character

and so on, until nothing remains from the original string.

The modified code looks as follows:

while(st.length() > 0)
{   

    for(int i_map = 0; i_map < charCodeArrayList.size(); i_map++)
    {
        CharCode cc = charCodeArrayList.get(i_map);

        if(st.startsWith(cc.getCode()))
        {
            System.out.println("found: " +  cc.getChr());
            st = st.substring(cc.getCode().length()); 
            break;
        }//end if

    }//end for      

}//end while

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM