简体   繁体   English

霍夫曼编码没有查找表的单个字符

[英]Huffman encode single character without lookup table

I am trying to implement Huffman Coding and can't figure out how to encode a character using the trie without generating a lookup table. 我正在尝试实现霍夫曼编码,无法在不生成查找表的情况下弄清楚如何使用特里编码字符。 What I don't want to do is generate a map of each character to its encoded string of bits. 我不想做的是生成每个字符与其编码的位字符串的映射。 I am trying to write a method which takes in the root of the Huffman trie and a character and spits out the correct code. 我正在尝试编写一种方法,该方法采用霍夫曼(Huffman)Trie的根和一个字符,并吐出正确的代码。

I have written the following code which doesn't work correctly. 我写了下面的代码不能正常工作。 The problem is, I don't know how to get it to stop after getting the right result. 问题是,我不知道如何在得到正确的结果后停止它。 It continues to add to the code after reaching the correct leaf node. 到达正确的叶节点后,它将继续添加到代码中。

string lookup(HuffNode* root, char c, string prevCode, string direction){
  string currentCode = prevCode + direction;
  if(!root->isLeaf()){
    currentCode = lookup(root->getLeft(), c, currentCode, "0");
    currentCode = lookup(root->getRight(), c, currentCode, "1");
  }else{
    if(root->getChar() == c){
      return currentCode;
    }else{
      return prevCode;
    }
  }

  return currentCode;
}

string encodeChar(HuffNode* trie, char c){
  return lookup(trie, c, "", "");
} 

What would be the correct way to get the encoding of a character from a trie? 从特里获取字符编码的正确方法是什么?

It would be terribly inefficient to have to search a good chunk of the tree every single time you want to emit a code. 每次要发出代码时都必须搜索树的一大块,这将是非常低效的。 You should in fact make a table of codes for each symbol. 实际上,您应该为每个符号制作一个代码表。

In any case, what you need to do is build up the code in a single string that is passed by reference. 无论如何,您需要做的是在通过引用传递的单个字符串中构建代码。 Do not return the string. 不要返回字符串。 (It will be there when it's all done in the string that was originally passed by reference.) Instead return true or false for if the symbol was found in a leaf. (在原始通过引用传递的字符串中完成所有操作后,它就会在那里。)如果在叶中找到该符号,则返回true或false。 Then when you call lookup() on each branch, see if it returns true. 然后,当您在每个分支上调用lookup()时,请查看其是否返回true。 If it does, then immediately return true. 如果是这样,则立即返回true。 For each call, add the appropriate bit to the string (0 or 1). 对于每个呼叫,将适当的位添加到字符串(0或1)。 If it returns false for the second branch, remove the bit you added, and return false. 如果第二个分支返回false,则删除添加的位,然后返回false。

Then the original call will return true if it finds the symbol, and the string will have the code. 然后,如果原始调用找到了符号,则它将返回true,并且字符串将具有代码。 If it returns false, the symbol wasn't in any of the leaves, and the string will be empty. 如果返回假,则该符号不在任何叶子中,并且该字符串为空。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM