简体   繁体   中英

How to handle this in huffman coding?

The input for the compression character with frequencies are,

A = 1
B = 2
C = 4
D = 8
E = 16
F = 32
G = 64
H = 128
I = 256
J = 512
K = 1024
L = 2048
M = 4096
N = 8192

The huffman coding algorithm is,

First we have to pick two lowest frequencies characters and implement a tree, with the parent as sum of those two character frequencies. After than put 0 to left child and 1 to right child. Then finally select the value for each character as binary form , to select this starts form root and find it is placed in left or right, after that if it is placed in left add 0, if it is right add 1.

It forms a tree it goes above 8 level. We have to mention the binary in 8 bits only. But for this input, the bit crosses the 8. Here what we have to do?

If you encode all 256 possible values, some will be represented by more than 8 bits, that's right. But your encoded string isn't interpreted as an array ob bytes, but as a series of bits, which may occupy more than one byte, so it is okay to have branches of your Huffman tree that go deeper than eight levels.

Say you have a Huffman tree that contains these encodings (among others):

E          000               # 3 bits
X          0100000001        # 10 bits
NUL        001               #3 bits

Now when you want to encode the string EEXEEEX , you get:

E   E   X          E   E   E   X          NUL      # original text
000 000 0100000001 000 000 000 0100000001 001      # encoded bits

You now organise this series of bits into blocks of 8, that is bytes:

eeeEEExx    xxxxxxxx    EEEeeeEE    Exxxxxxx    xxxNNN      # orig

00000001    00000001    00000000    00100000    00100100    # bits
enc[0]      enc[1]      enc[2]      enc[3]      enc[4]      # bytes

(The sub-blocks of four are just for easy reading. The last two zero bits are padding.) The byte array enc is now your encoded string.

The compression comes from the fact that frequently used characters occupy less than a byte. For example the first two Es fit into a single byte. Infrequent charactes like X here have a longer encoding, which may even span several bytes.

You must, of course extract the current bit from the current byte in order to traverse your Huffman tree. You'll need the bitwise operators for that.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM