[英]Why do we need unsigned char for Huffman tree code
I am trying to create a Huffman tree the question I read is very strange for me, it is as follows: 我正在尝试创建一个霍夫曼树,我读到的问题对我来说很奇怪,它如下:
Given the following data structure:
鉴于以下数据结构:
struct huffman { unsigned char sym; /* symbol */ struct huffman *left, *right; /* left and right subtrees */ };
write a program that takes the name of a binary file as sole argument, builds the Huffman tree of that file assuming that atoms (elementary symbols) are 8-bit unsigned characters, and prints the tree as well as the dictionary.
编写一个以二进制文件名作为唯一参数的程序,假设原子(基本符号)是8位无符号字符,构建该文件的霍夫曼树,并打印树和字典。
allocations must be done using nothing else than malloc(), and sorting can be done using qsort().必须使用除malloc()之外的其他任何操作来完成分配,并且可以使用qsort()完成排序。
Here the thing which confuses me is that to write a program to create a huffman tree we just need to do following things: 这让我感到困惑的是,编写一个程序来创建一个霍夫曼树,我们只需要做以下事情:
Farray[]={.......}
) Farray[]={.......}
) Now the question is here: why and where do we need those unsigned char data? 现在的问题是:我们为什么以及在哪里需要那些未签名的char数据? (what type of unsigned char data this question want, I think only frequency is enough to display a Huffman tree)?
(这个问题想要什么类型的unsigned char数据,我认为只有频率足以显示一个Huffman树)?
If you purely want to display the shape of the tree, then yes, you just need to build it. 如果你纯粹想要显示树的形状 ,那么是的,你只需要构建它。 However, for it to be of any use whatsoever you need to know what original symbol each node represents.
但是,对于任何用途,您需要知道每个节点代表什么原始符号。
Imagine your input symbols are [ABCD]. 想象一下你的输入符号是[ABCD]。 An imaginary Huffman tree/dictionary might look like this:
想象中的霍夫曼树/字典可能如下所示:
( )
/ \ A = 1
( ) (A) B = 00
/ \ C = 010
(B) ( ) D = 011
/ \
(C) (D)
If you don't store sym
, it looks like this: 如果你不存储
sym
,它看起来像这样:
( )
/ \ A = ?
( ) ( ) B = ?
/ \ C = ?
( ) ( ) D = ?
/ \
( ) ( )
Not very useful, that, is it? 不是很有用,那是吗?
Edit 2: The missing step in the plan is step 0: build the frequency array from the file (somehow I missed that you don't need to actually encode the file too). 编辑2:计划中缺少的步骤是步骤0:从文件构建频率数组(不知怎的,我错过了你不需要实际编码文件)。 This isn't part of the actual Huffman algorithm itself and I couldn't find a decent example to link to, so here's a rough idea:
这不是实际的霍夫曼算法本身的一部分,我找不到一个合适的例子来链接,所以这里有一个粗略的想法:
FILE *input = fopen("inputfile", "rb");
int freq[256] = {0};
int c;
while ((c = fgetc(input)) != EOF)
freq[c]++;
fclose(input);
/* do Huffman algorithm */
...
Now, that still needs improving since it neither uses malloc()
nor takes a filename as an argument, but it's not my homework ;) 现在,仍然需要改进,因为它既不使用
malloc()
也不使用文件名作为参数,但它不是我的功课;)
It's a while since I did this, but I think the generated "dictionary" is required to encode data, while the "tree" is used to decode it. 我这样做了一段时间,但我认为生成的“字典”需要对数据进行编码 ,而“树”则用于对其进行解码 。 Of course, you can always build one from the other.
当然,你总是可以从另一个构建一个。
While decoding, you traverse the tree (left/right, according to successive input bits), and when you hit a terminal node (null pointer) then the 'sym' in the node is the output value. 在解码时,您遍历树(左/右,根据连续的输入位),当您点击终端节点(空指针)时,节点中的'sym'是输出值。
Usually data compression is divided into 2 big steps; 通常数据压缩分为两大步骤; given a stream of data:
给定一个数据流:
In practice it's a little bit more complicated than this, because trees are involved, but the main purpose is always to build the dictionary. 在实践中它比这更复杂,因为树涉及,但主要目的始终是构建字典。
There is a complete tutorial here . 这里有一个完整的教程 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.