简体   繁体   中英

Canonical huffman encoding algo

Hello I am trying to implement Canonical huffman encoding but i dont understand wiki and google guides, I need explain more abstractly...

I tried this: 1. Get list of regular huffman encoding length's codes. like this:

A - code: 110, length: 3.
B - code: 111, length: 3.
C - code: 10, length 2.
D - code: 01, length 2.
E - code: 00, length 2.
  1. I sorting the table by symbol and length like this:
 C - code: 10, length 2. D - code: 01, length 2. E - code: 00, length 2. A - code: 110, length: 3. B - code: 111, length: 3. 

now i dont know how to proceed...

tnx a lot

Throw out the codes you get from the Huffman algorithm. You don't need those. Just keep the lengths.

Now assign the codes based on the lengths and the symbols. Sort by length, from shortest to longest, and within each length, sort the symbols in ascending order. (How you do that exactly doesn't matter, so long as every symbol is strictly less than or greater than any other symbol, and the encoder and decoder agree on how to do it.)

So we do the ordering:

C - 2
D - 2
E - 2
A - 3
B - 3

Two's come before three's, and within the 2's, C, D, E are in order, and within the 3's, A, B are in order.

Now we assign the code in integer order within each length, adding a zero bit at the end each time we go up a length:

C - 2 - 00
D - 2 - 01
E - 2 - 10
A - 3 - 110 <- after incrementing to 11, a zero was added to make 110
B - 3 - 111

That is a canonical code.

You could do it other ways if you like and still be canonical, eg counting backwards from 11, so long as the encoder and decoder agree on the approach. The whole point is to only have to transmit the lengths for each symbol from the encoder to the decoder, so as to not have to transmit the codes themselves which take more space.

You should sort symbols by there frequency, so most often would be on top and least often would be on bottom. (Overall frequency - 1):

A (0.5)
B (0.2)
C (0.15)
D (0.15)

Then mark one symbol with 0 and other with 1 , summ there frequencies and insert into proper position in list and again mark two least with 0 and 1 :

A (0.5)      A (0.5)
B (0.2)      C&D (0.3) 0
C (0.15) 0   B (0.2)   1
D (0.15) 1

And again...

A (0.5)      A (0.5)        A (0.5)      0
B (0.2)      C&D (0.3) 0    B&C&D (0.5)  1
C (0.15) 0   B (0.2)   1
D (0.15) 1

Until you obtain last pair. The path, marked by 0 and 1 from tail to symbol would be corresponding Huffman code:

A 0
B 11
C 100
D 101

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM