简体   繁体   English

JPEG霍夫曼表

[英]JPEG Huffman Table

I have a question regarding the JPEG Huffman Table and using the Huffman Table to construct the symbol/binary string from a Tree. 我对JPEG Huffman表以及使用Huffman表从树中构造符号/二进制字符串有疑问。 Suppose, that in an Huffman Table for 3-Bit code Length the number of codes is greater than 6, then how do we add all those codes in the Tree? 假设在3-Bit代码长度的霍夫曼表中,代码数大于6,那么我们如何在树中添加所有这些代码? If I am correct only 6 codes can be added at the 3-bit level/depth of the tree. 如果我是正确的,那么只能在树的3位级别/深度处添加6个代码。 So, how do we add the remaining codes if they won't fit in that level? 那么,如果剩余的代码不适合该级别,我们如何添加呢? Do we just ignore them? 我们只是忽略它们吗?

Example

code length | Total Codes | Codes  
3-Bit       |    10       | 25 43 34 53 92 A2 B2 63 73 C2

In the above example if we go by order of constructing symbols/binary string for the code then up 'til A2 we can add codes in the tree at level 3-Bit, but what about B2,63,73,C2 etc? 在上面的示例中,如果按照为代码构造符号/二进制字符串的顺序进行操作,那么直到A2为止,我们都可以在树中的3位级别添加代码,但是B2、63、73,C2等呢? It's not possible to add them at 3-Bit level of the tree? 不可能在树的3位级别上添加它们吗? So what do we do with them? 那么我们如何处理他们?

Well, clearly, the absolutely highest number of "things" that can be represented in 3 bits is 8 - (000, 001, 010, 011, 100, 101, 110, 111). 好吧,很明显,可以用3位表示的“事物”的绝对最高数量是8-(000、001、010、011、100、101、110、111)。

In Huffman encoding, bits represent "left" or "right" in a trie data-structure, to be able to "continue", you have to use SOME codes for "this continues another level", which is why not all 8 values can be encoded in 3 bits. 在霍夫曼编码中,位表示trie数据结构中的“左”或“右”,为了能够“继续”,您必须对“这继续另一个级别”使用一些代码,这就是为什么不是所有8个值都可以以3位编码。 If you have more values to encode, you need to use more bits (for some values - this is the whole point of Huffman coding, that SOME combinations are short, others are longer, and sometimes even longer than the original, but because it's based on what is the most common, it's fine, because they will be rare...) 如果您要编码的值更多,则需要使用更多的位(对于某些值-这是霍夫曼编码的全部要点,即某些组合比原始组合短,其他组合甚至更长,有时甚至更长,但这是因为它是基于在最常见的情况下,这很好,因为它们很少见...)

How to construct and decode a Huffman tree is about four-five pages in your typical Algorithms book, and if you haven't got one of those, you probably want to find one - either a real paper one, or an e-book. 在典型的《算法》一书中,如何构造和解码霍夫曼树大约需要四到五页,如果您还没有找到其中的一本,您可能想找到一本,无论是一本真正的纸质书还是一本电子书。 There are LOTS of them - I'm not going to recommend one, since the ones I have are all about 15+ years old. 有很多-我不推荐一个,因为我所有的都大约15岁以上。

I should add that I think your question is missing something. 我还要补充一点,我认为您的问题缺少一些内容。 Clearly, 3 bits can not possibly represent 10 values. 显然,3位不可能代表10个值。 And you can't build a [meaningful] Huffman tree on 10 values that all different - unless the idea is to split the values into pairs of {2,5}, {4,3}, {3,4}, {5,3}, {9,2}, {A,2}, {B,2}, {6,3}, {7,3}, {C,2} - which gives a fair number of repeated values - frequency of those are: 2 : 5 3 : 5 4 : 2 5 : 2 6 : 1 7 : 1 9 : 1 A : 1 B : 1 C : 1 而且,您无法在全部不同的10个值上构建[有意义的]哈夫曼树-除非想法是将这些值分成成对的{2,5}, {4,3}, {3,4}, {5,3}, {9,2}, {A,2}, {B,2}, {6,3}, {7,3}, {C,2} -给出相当数量的重复值-频率其中包括:2:5 3:5 4:2 5:2 6:1 7:1 9:1 A:1 B:1 C:1

But that's stil too many to represent anything meaningful... 但这实在太多了,无法代表任何有意义的东西...

Or is it the other way around, that we are supposed to use the bit values of those to decode? 还是反过来,我们应该使用那些位的值进行解码? In which case we'd need the tree built from the original data to decode it... 在这种情况下,我们需要使用原始数据构建的树来对其进行解码...

In JPEG, a Huffman code can be up to 16-bits. 在JPEG中,霍夫曼码最多可以为16位。 The DHT market contains an array of 16 elements giving the number of codes for each length. DHT市场包含16个元素的数组,给出了每种长度的代码数量。

The JPEG standard explains how to use the code counts to do the Huffman translation. JPEG标准说明了如何使用代码计数进行霍夫曼翻译。 It is one of the few things explained in detail. 这是详细解释的少数事情之一。

This book explains how it is done from a programmers perspective. 本书从程序员的角度解释了它是如何完成的。

JPEG Book JPEG书

The number of codes that exists at any code length depends upon the counts for other lengths. 任何代码长度上存在的代码数量取决于其他长度的计数。

I am wondering if you are really looking at the count of codes for length 4 rather than 3. 我想知道您是否真的在看长度为4而不是3的代码计数。

It looks like you're not following the correct procedure when creating your Huffman codes from the JPEG table. 从JPEG表创建霍夫曼代码时,您似乎未遵循正确的步骤。 The count provided will fit in the number of bits unless the table has been corrupted. 除非表已损坏,否则提供的计数将适合位数。 The reading out of the codes from a DHT marker is really simple. 从DHT标记中读取代码非常简单。 The more complicated part is how you define your lookup table from that data. 更复杂的部分是如何根据该数据定义查找表。 A logical (but not practical) way is to create a reverse lookup table that's the maximum code length in size (16-bits = 65536 entries in the table). 逻辑(但不实际)的方法是创建一个反向查询表,该表的大小为最大代码长度(表中的16位= 65536个条目)。 Then to decode your JPEG data, just pick up 16-bits of compressed data from the input stream and use it as an index in the table where you'll have the symbol and actual length of the code. 然后,要解码JPEG数据,只需从输入流中提取16位压缩数据并将其用作表中的索引,即可在其中获得符号和代码的实际长度。 I came up with a way to use a single, much smaller lookup table. 我想出了一种使用单个较小的查找表的方法。 I'm not going to share my specific code table method. 我不会分享我的特定代码表方法。 What I will share is the basic format of the loop to create the codes from a DHT marker: 我将分享的是从DHT标记创建代码的循环的基本格式:

int iCurrentCode; // the current Huffman code
int iLength; // the code length in bits that you're working on
int i;
int iCount; // the number of codes defined for this length
int iSymbol; // JPEG symbol defined for each Huffman code
unsigned char *pData; // pointer to the data in the DHT marker

iCurrentCode = 0; // start with a Huffman code of 0
for (iLength = 1; iLength <= 16; iLength++)
{
    iCount = *pData++; // get number of symbols for this bit length
    for (i=0; i<iCount; i++) // read each of the codes for this bit length
    {
        iSymbol = *pData++; // get the JPEG symbol value (e.g. RRRR/SSSS value)
        // It's up to you to create a lookup table from the code and its value
        iCurrentCode++; // the Huffman bit pattern just increments for each code value
    } // for each code defined at this bit length
    iCurrentCode <<= 1; // shift the code left 1 bit to advance to the next bit length
} // for each bit length

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM