简体   繁体   English

PHP 霍夫曼解码算法

[英]PHP Huffman Decode Algorithm

I applied for a job recently and got sent a hackerrank exam with a couple of questions.One of them was a huffman decoding algorithm.我最近申请了一份工作,并收到了一份有几个问题的黑客等级考试。其中一个是霍夫曼解码算法。 There is a similar problem available here which explains the formatting alot better then I can.这里有一个类似的问题它比我能更好地解释格式。

The actual task was to take two arguments and return the decoded string.实际任务是接受两个参数并返回解码后的字符串。

The first argument is the codes, which is a string array like:第一个参数是代码,它是一个字符串数组,如:

[
    "a      00",
    "b      101",
    "c      0111",
    "[newline]      1001"
]

Which is like: single character, two tabs, huffman code.就像:单个字符,两个选项卡,霍夫曼代码。

The newline was specified as being in this format due to the way that hacker rank is set up.由于黑客排名的设置方式,换行符被指定为这种格式。

The second argument is a string to decode using the codes.第二个参数是要使用代码解码的字符串。 For example:例如:

101000111 = bac

This is my solution:这是我的解决方案:

function decode($codes, $encoded) {
    $returnString = '';
    $codeArray = array();

    foreach($codes as $code) {
        sscanf($code, "%s\t\t%s", $letter, $code);
        if ($letter == "[newline]")
            $letter = "\n";
        $codeArray[$code] = $letter;
    }
    print_r($codeArray);

    $numbers = str_split($encoded);
    $searchCode = '';
    foreach ($numbers as $number) {
        $searchCode .= $number;
        if (isset($codeArray[$searchCode])) {
            $returnString .= $codeArray[$searchCode];
            $searchCode = '';
        }
    }

    return $returnString;
}

It passed the two initial tests but there were another five hidden tests which it did not pass and gave no feedback on.它通过了两个初始测试,但还有另外五个隐藏测试没有通过并且没有给出任何反馈。

I realize that this solution would not pass if the character was a white space so I tried a less optimal solution that used substr to get the first character and regex matching to get the number but this still passed the first two and failed the hidden five.我意识到如果字符是空格,这个解决方案不会通过,所以我尝试了一个不太理想的解决方案,它使用 substr 来获取第一个字符和正则表达式匹配来获取数字,但这仍然通过了前两个,而隐藏的五个失败了。 I tried function in the hacker rank platform with white-space as input and the sandboxed environment could not handle it anyway so I reverted to the above solution as it was more elegant.我在hacker rank平台中尝试使用空格作为输入的函数,沙盒环境无论如何都无法处理它,所以我回到了上面的解决方案,因为它更优雅。

I tried the code with special characters, characters from other languages, codes of various sizes and it always returned the desired solution.我尝试使用特殊字符、来自其他语言的字符、各种大小的代码的代码,它总是返回所需的解决方案。

I am just frustrated that I could not find the cases that caused this to fail as I found this to be an elegant solution.我只是很沮丧,因为我发现这是一个优雅的解决方案,因此我找不到导致此失败的案例。 I would love some feedback both on why this could fail given that there is no white-space and also any feedback on performance increases.我希望得到一些关于为什么在没有空白的情况下会失败的反馈,以及关于性能提升的任何反馈。

Your basic approach is sound.你的基本方法是合理的。 Since a Huffman code is a prefix code, ie no code is a prefix of another, then if your search finds a match, then that must be the code.由于霍夫曼代码是一个前缀代码,即没有代码是另一个代码的前缀,那么如果您的搜索找到匹配项,那么它必须是代码。 The second half of your code would work with any proper Huffman code and any message encoded using it.代码的后半部分将适用于任何适当的霍夫曼代码和使用它编码的任何消息。

Some comments.一些评论。 First, the example you provide is not a Huffman code, since the prefixes 010 , 0110 , 1000 , and 11 are not present.首先,您提供的示例不是霍夫曼代码,因为前缀0100110100011不存在。 Huffman codes are complete, whereas this prefix code is not.霍夫曼代码是完整的,而这个前缀代码不是。

This brings up a second issue, which is that you do not detect this error.这带来了第二个问题,即您没有检测到此错误。 You should be checking to see if $searchCode is empty after the end of your loop.您应该在循环结束后检查$searchCode是否为空。 If it is not, then the code was not complete, or a code ended in the middle.如果不是,则代码不完整,或者代码在中间结束。 Either way, the message is corrupt with respect to the provided prefix code.无论哪种方式,相对于提供的前缀代码,消息都已损坏。 Did the question specify what to do with errors?问题是否指定了如何处理错误?

The only real issue I would expect with this code is that you did not decode the code description generally enough.我希望这段代码唯一真正的问题是你没有对代码描述进行足够的解码。 Did the question say there were always two tabs, or did you conclude that?问题是说总是有两个选项卡,还是您得出结论? Perhaps it was just any amount of space and tabs.也许它只是任意数量的空间和制表符。 Where there other character encodings you neeed to convert like [newline] ?您需要在哪里转换其他字符编码,例如[newline] I presume you in fact did need to convert them, if one of the examples that worked contained one.如果有效的示例之一包含一个,我认为您实际上确实需要转换它们。 Did it?做到了吗? Otherwise, maybe you weren't supposed to convert.否则,也许你不应该转换。

I had the same question for an Coding Challenge.我对编码挑战有同样的问题。 with some modification as the input was a List with (a 111101,b 110010,[newline] 111111 ....)做了一些修改,因为输入是一个带有 (a 111101,b 110010,[newline] 111111 ....)

I took a different approach to solve it,using hashmap but still i too had only 2 sample test case passed.我采用了不同的方法来解决它,使用 hashmap 但我仍然只通过了 2 个示例测试用例。

below is my code:下面是我的代码:

public static String decode(List<String> codes, String encoded) {
    // Write your code here
         String result = "";
         String buildvalue ="";
         HashMap <String,String> codeMap= new HashMap<String,String>();
        for(int i=0;i<codes.size();i++){
           String S= codes.get(i);
           String[] splitedData = S.split("\\s+"); 
           String value=splitedData[0];
           String key=(splitedData[1].trim());            
         codeMap.put(key, value);
        }
        for(int j=0;j<encoded.length();j++){
              buildvalue+=Character.toString(encoded.charAt(j));
              if(codeMap.containsKey(buildvalue)){
                  if(codeMap.get(buildvalue).contains("[newline]")){
                    result+="\n";
                    buildvalue="";
                  }
                  else{
                   result+=codeMap.get(buildvalue);
                   buildvalue="";
                  }
              }
         }
         return result.toString();

    }

}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM