简体   繁体   中英

JPEG Huffman Table

I have a question regarding the JPEG Huffman Table and using the Huffman Table to construct the symbol/binary string from a Tree. Suppose, that in an Huffman Table for 3-Bit code Length the number of codes is greater than 6, then how do we add all those codes in the Tree? If I am correct only 6 codes can be added at the 3-bit level/depth of the tree. So, how do we add the remaining codes if they won't fit in that level? Do we just ignore them?

Example

code length | Total Codes | Codes  
3-Bit       |    10       | 25 43 34 53 92 A2 B2 63 73 C2

In the above example if we go by order of constructing symbols/binary string for the code then up 'til A2 we can add codes in the tree at level 3-Bit, but what about B2,63,73,C2 etc? It's not possible to add them at 3-Bit level of the tree? So what do we do with them?

Well, clearly, the absolutely highest number of "things" that can be represented in 3 bits is 8 - (000, 001, 010, 011, 100, 101, 110, 111).

In Huffman encoding, bits represent "left" or "right" in a trie data-structure, to be able to "continue", you have to use SOME codes for "this continues another level", which is why not all 8 values can be encoded in 3 bits. If you have more values to encode, you need to use more bits (for some values - this is the whole point of Huffman coding, that SOME combinations are short, others are longer, and sometimes even longer than the original, but because it's based on what is the most common, it's fine, because they will be rare...)

How to construct and decode a Huffman tree is about four-five pages in your typical Algorithms book, and if you haven't got one of those, you probably want to find one - either a real paper one, or an e-book. There are LOTS of them - I'm not going to recommend one, since the ones I have are all about 15+ years old.

I should add that I think your question is missing something. Clearly, 3 bits can not possibly represent 10 values. And you can't build a [meaningful] Huffman tree on 10 values that all different - unless the idea is to split the values into pairs of {2,5}, {4,3}, {3,4}, {5,3}, {9,2}, {A,2}, {B,2}, {6,3}, {7,3}, {C,2} - which gives a fair number of repeated values - frequency of those are: 2 : 5 3 : 5 4 : 2 5 : 2 6 : 1 7 : 1 9 : 1 A : 1 B : 1 C : 1

But that's stil too many to represent anything meaningful...

Or is it the other way around, that we are supposed to use the bit values of those to decode? In which case we'd need the tree built from the original data to decode it...

In JPEG, a Huffman code can be up to 16-bits. The DHT market contains an array of 16 elements giving the number of codes for each length.

The JPEG standard explains how to use the code counts to do the Huffman translation. It is one of the few things explained in detail.

This book explains how it is done from a programmers perspective.

JPEG Book

The number of codes that exists at any code length depends upon the counts for other lengths.

I am wondering if you are really looking at the count of codes for length 4 rather than 3.

It looks like you're not following the correct procedure when creating your Huffman codes from the JPEG table. The count provided will fit in the number of bits unless the table has been corrupted. The reading out of the codes from a DHT marker is really simple. The more complicated part is how you define your lookup table from that data. A logical (but not practical) way is to create a reverse lookup table that's the maximum code length in size (16-bits = 65536 entries in the table). Then to decode your JPEG data, just pick up 16-bits of compressed data from the input stream and use it as an index in the table where you'll have the symbol and actual length of the code. I came up with a way to use a single, much smaller lookup table. I'm not going to share my specific code table method. What I will share is the basic format of the loop to create the codes from a DHT marker:

int iCurrentCode; // the current Huffman code
int iLength; // the code length in bits that you're working on
int i;
int iCount; // the number of codes defined for this length
int iSymbol; // JPEG symbol defined for each Huffman code
unsigned char *pData; // pointer to the data in the DHT marker

iCurrentCode = 0; // start with a Huffman code of 0
for (iLength = 1; iLength <= 16; iLength++)
{
    iCount = *pData++; // get number of symbols for this bit length
    for (i=0; i<iCount; i++) // read each of the codes for this bit length
    {
        iSymbol = *pData++; // get the JPEG symbol value (e.g. RRRR/SSSS value)
        // It's up to you to create a lookup table from the code and its value
        iCurrentCode++; // the Huffman bit pattern just increments for each code value
    } // for each code defined at this bit length
    iCurrentCode <<= 1; // shift the code left 1 bit to advance to the next bit length
} // for each bit length

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM