简体   繁体   English

霍夫曼编码的字节频率表

[英]Byte frequency table for Huffman coding

I'm writing a huffman compressor and decompressor (in C++) that needs to work on arbitrary binary files. 我正在编写需要在任意二进制文件上工作的霍夫曼压缩器和解压缩器(在C ++中)。 I need a bit of data structure advice. 我需要一些数据结构建议。 Right now, my compression process is as follows: 现在,我的压缩过程如下:

  • Read the bytes of the file in binary form to a char* buffer 以二进制形式将文件的字节读取到char *缓冲区
  • Use an std::map to count the frequencies of each byte pattern in the file. 使用std :: map来计数文件中每个字节模式的频率。 (This is where I think I'm asking for trouble.) (这就是我想找麻烦的地方。)
  • Build the binary tree based on the frequency histogram. 根据频率直方图构建二叉树。 Each internal node has the sum of the frequencies of its children and each leaf node has a char* to represent the actual byte. 每个内部节点都有其子节点的频率之和,每个叶节点都有一个char *表示实际字节。

This is where I'm at so far. 这是我到目前为止的位置。

My question is what exactly I'm measuring if I just use a map from char* to int. 我的问题是,如果我只是使用从char *到int的映射,我到底要测量什么。 If I'm correct, this isn't actually what I need. 如果我是正确的,那实际上不是我所需要的。 What I think I'm really doing is tracking the actual 4-byte pointer values by using char*. 我想我真正在做的是通过使用char *跟踪实际的4字节指针值。

So, what I plan to do is use a map for the histogram and a char for the data stored at leaf nodes. 因此,我计划做的是对直方图使用映射,对叶节点上存储的数据使用char。 Is my logic sound here? 我的逻辑声音在这里吗? My reasoning tells me yes, but since this is my first time dealing with binary data, I'd like to be careful of pitfalls that will only show up in strange ways. 我的推理告诉我是的,但是由于这是我第一次处理二进制数据,因此我要谨防那些只会以奇怪的方式出现的陷阱。

Thanks. 谢谢。

You don't need a map; 您不需要地图; there are only 256 possible values. 只有256个可能的值。 Just have int freq[256] = {0} and add to it with freq[data[idx]]++ for each byte in the input. 只需使int freq[256] = {0}并为输入中的每个字节添加freq[data[idx]]++

If you REALLY want a map, use map<unsigned char, int> ; 如果您真的想要一张地图,请使用map<unsigned char, int> ; your suspicion on using map from char* is correct. 您怀疑使用char* map是正确的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM