简体   繁体   English

将霍夫曼代码输出到文件

[英]Outputting Huffman codes to file

I have a program that reads a file and saves the frequency of each character. 我有一个程序可以读取文件并保存每个字符的频率。 It then constructs a huffman tree based on each character's frequency and then outputs to a file the huffman codes for the tree. 然后,它根据每个字符的频率构造一个霍夫曼树,然后将树的霍夫曼代码输出到文件中。

So an input like "Hello World" would output this sequence to a file: 因此,像“ Hello World”这样的输入会将以下序列输出到文件:

01010101 0010 010 010 01010 0101010 000 01010 00101 010 0001

This makes sense because the most frequent characters have the shortest codes. 这是有道理的,因为最频繁的字符具有最短的代码。 The issue is, this increases the file size ten-fold. 问题是,这会使文件大小增加十倍。 I realized the reason why is because each 1 and 0 is being represented in memory as its own character, so they get each get expanded out to a byte of data. 我意识到了为什么是因为每个1和0在内存中都表示为自己的字符,所以它们分别被扩展为一个字节的数据。

I was thinking what I could do is convert each code (EG "010") to a character and save that to file - but that still would pad the code to be a byte long (Or mess it up if the code is longer than a byte). 我当时想我可以做的是将每个代码(例如EG“ 010”)转换为字符并将其保存到文件中-但这仍然会将代码填充为一个字节长(如果代码长于一个字节,则将其弄乱)字节)。

How do I go about this? 我该怎么办? I can give code snippets if needed - I'm basically saving each code into a string so that's why the file's coming out so big (It's outputting each "bit" as a byte). 我可以根据需要提供代码片段-我基本上是将每个代码保存到字符串中,这就是为什么文件很大的原因(它将每个“位”输出为字节)。 If I were to convert the code to a long for example, then a code like 00010 would be represented as 2 and a code like 010 would also be represented as 2. 例如,如果我要将代码转换为long,则将00010之类的代码表示为2,将010之类的代码也表示为2。

You basically have to do it a byte (or a word) at a time. 您基本上必须一次做一个字节(或一个字)。 Maintain a byte which you fill with bits, and a record of how many bits have been filled in so far. 维护一个用位填充的字节,并记录到目前为止已填充了多少位。 When you get to 8, write the byte and start over with an empty one. 当您达到8时,写一个字节并从一个空的字节开始。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM