简体   繁体   中英

Creating a Huffman Tree in PHP

I am trying to create a Huffman Tree using PHP.

This is what I have:

<!DOCTYPE html>
<html>
    <body>
        <?php
            $extract = 'Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse elementum imperdiet aliquet. Duis non molestie orci. Ut eget nibh nec augue ultricies porttitor.';
            $characters = count_chars($extract, 1);            
            asort($characters);
            foreach($characters as $character => $occurrence)
            {
               echo 'There';
               if($occurrence > 1)
               {
                   echo ' were ' . $occurrence . ' occurrences of ';
               }
               else
               {
                   echo ' was '. $occurrence . ' occurrence of ';
               }
               echo '"<strong>' . chr($character) . '</strong>" in the extract.<br />';
               $characterFreq[chr($character)] = $occurrence;              
            }
            print_r($characterFreq);
        ?>
    </body>
</html>

Which outputs:

There was 1 occurrence of "S" in the extract.
There was 1 occurrence of "U" in the extract.
There was 1 occurrence of "h" in the extract.
There was 1 occurrence of "L" in the extract.
There was 1 occurrence of "D" in the extract.
There was 1 occurrence of "," in the extract.
There was 1 occurrence of "q" in the extract.
There was 1 occurrence of "b" in the extract.
There were 3 occurrences of "g" in the extract.
There were 4 occurrences of "a" in the extract.
There were 4 occurrences of "." in the extract.
There were 4 occurrences of "d" in the extract.
There were 5 occurrences of "p" in the extract.
There were 6 occurrences of "c" in the extract.
There were 6 occurrences of "l" in the extract.
There were 7 occurrences of "m" in the extract.
There were 8 occurrences of "n" in the extract.
There were 8 occurrences of "r" in the extract.
There were 9 occurrences of "u" in the extract.
There were 9 occurrences of "o" in the extract.
There were 10 occurrences of "s" in the extract.
There were 15 occurrences of "t" in the extract.
There were 17 occurrences of "i" in the extract.
There were 20 occurrences of "e" in the extract.
There were 22 occurrences of " " in the extract.
Array ( [S] => 1 [U] => 1 [h] => 1 [L] => 1 [D] => 1 [,] => 1 [q] => 1 [b] => 1 [g] => 3 [a] => 4 [.] => 4 [d] => 4 [p] => 5 [c] => 6 [l] => 6 [m] => 7 [n] => 8 [r] => 8 [u] => 9 [o] => 9 [s] => 10 [t] => 15 [i] => 17 [e] => 20 [ ] => 22 )

I have been using mixtures of array_slice() , array_splice() and array_unshift() but am having trouble with recursion.

Ideally, leaves and branches will be denoted by array indices 0 & 1.

Any ideas on how to make a Huffman tree in multidimensional array form would be greatly appreciated.

It's a full solution to the problem, in PHP:

function huffmannEncode($string) {
    $originalString = $string;
    $occurences = array();

    while (isset($string[0])) {
        $occurences[] = array(substr_count($string, $string[0]), $string[0]);
        $string = str_replace($string[0], '', $string);
    }

    sort($occurences);
    while (count($occurences) > 1) {
        $row1 = array_shift($occurences);
        $row2 = array_shift($occurences);
        $occurences[] = array($row1[0] + $row2[0], array($row1, $row2));
        sort($occurences);
    }

    // $dictionary is an array that gets filled with the values with a recursive method
    $dictionary = [];
    fillDictionary($dictionary, is_array($occurences[0][1]) ? $occurences[0][1] : $occurences);

    // Generate the final encoded message
    $encoded = '';
    for($i = 0; $i < strlen($originalString); $i++) {
        $encoded .= $dictionary[$originalString[$i]];
    }
    return $encoded;
}

// This function runs recursively to generate the Huffmann tree
function fillDictionary(&$dictionary, $data, $value = '') {
    if (!is_array($data[0][1])) {
        $dictionary[$data[0][1]] = $value.'0';
    } else {
        fillDictionary($dictionary, $data[0][1], $value.'0');
    }
    if (isset($data[1])) {
        if (!is_array($data[1][1])) {
            $dictionary[$data[1][1]] = $value.'1';
        } else {
            fillDictionary($dictionary, $data[1][1], $value.'1');
        }
    }
}

This function calculates the occurrences, and with a recursive sub-function it generates the dictionary that assigns a binary code to each character.

Here is an example:

// Test the functionality:
echo huffmannEncode('hello world');
// Output: 00100010101101110011110010101111
// And the dictionary:
/* 
[e] => 000
[h] => 001
[r] => 010
[w] => 011
[l] => 10
[o] => 110
[ ] => 1110
[d] => 1111
*/

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM