简体   繁体   English

如何编写代码并读取 PHP 中修改后的(!)霍夫曼代码的二叉树

[英]How to code a, and read in a binary tree of modified(!) Huffman codes in PHP

I am writing a class for the decoding of fax data encoded withmodified Huffman code.我正在编写一个 class 用于解码使用修改后的霍夫曼代码编码的传真数据。

Data is coded line by line: data describes each pixel row.数据逐行编码:数据描述每个像素行。 Lines are coded as records of variable length.行被编码为可变长度的记录。 The pixel bits are stored in the bits of code words, least significant first.像素位存储在码字的位中,最低有效位在前。

Recently the code word list (182 elements) is defined as an array:最近码字列表(182个元素)被定义为一个数组:

/**
 * [0] code word
 * [1] length of code word
 * [2] run length of color bits
 * [3] 0 = white / 1 = black
 * [4] 1 = termination codes / 0 = make up codes
 */
const CODEWORDS = [
   [0b00110101, 8, 0, 0, 1],             // termination codes white
   [0b000111, 6, 1, 0, 1],
   [0b0111, 4, 2, 0, 1],
   [0b1000, 4, 3, 0, 1],
   [0b1011, 4, 4, 0, 1],
   [0b1100, 4, 5, 0, 1],
   [0b1110, 4, 6, 0, 1],
   [0b1111, 4, 7, 0, 1],
   [0b10011, 5, 8, 0, 1],
   ...
];

Before usage the array is sorted in descending order according to the length of the code words.使用前,数组按照码字的长度降序排列。

In a first approach I´m able to find the correct code words with repeating foreach -iterations over this array - but it's (not surprising!) terribly slow .在第一种方法中,我能够通过在这个数组上重复foreach迭代来找到正确的代码字 - 但它(不足为奇!)非常慢

It is clear to me, that an increase in performance can only be achieved using a binary tree.我很清楚,只有使用二叉树才能提高性能。 But even after looking at several explanations here or solutions (libraries) in GitHub, I can't find access to但即使在这里查看了几个解释或 GitHub 中的解决方案(库),我也找不到访问

  • how to transfer the data from the array into a binary tree如何将数组中的数据传输到二叉树中
  • how to browse the tree to get the right leaf如何浏览树以获得正确的叶子

If someone could help me there, I would be very grateful.如果有人可以在那里帮助我,我将不胜感激。

Once you have the correct codes (see my comments on your question), then you start by building one set of codes for white and one for black.一旦您拥有正确的代码(请参阅我对您的问题的评论),然后您首先构建一组白色代码和一组黑色代码。 For each, you start the tree with a branch for the first bit being zero, and another branch for one.对于每一个,您都从一个分支开始树,第一位为零,另一个分支为一位。 Break up your set of codes into two sets, one set where all the codes start with zero, and the other where they all start with one.将您的一组代码分成两组,一组所有代码都以零开头,另一组所有代码都以 1 开头。 For each of those, make two branches.对于其中的每一个,制作两个分支。 Break up each set based on the second bit.根据第二个位分解每个集合。 Once you get to a branch with one code and you just used the last bit of that code, you now have a leaf.一旦你到达一个带有一个代码的分支并且你只使用了该代码的最后一点,你现在就有了一片叶子。 In that leaf you store the symbol for the code, eg 63 for white code 00110100 .在该叶子中,您存储代码的符号,例如白色代码00110100的 63 。 If you get to a branch and there are no codes, then you again have a leaf, but this time it will result in a decoding error if it is reached.如果你到达一个分支并且没有代码,那么你又会有一个叶子,但是这一次如果到达它会导致解码错误。

To decode, take the first bit and go down that branch.要解码,请沿该分支获取第一位和 go。 Choose the second branch depending on the second bit.根据第二个位选择第二个分支。 And so on until you get to a leaf.依此类推,直到你得到一片叶子。 Then emit that symbol and start back again at the root with the subsequent bit.然后发出该符号并使用后续位从根处重新开始。 Or terminate if you end up at an error leaf.或者如果您最终遇到错误叶子,则终止。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM