简体   繁体   中英

I don't understand this Huffman algorithm implementation

    template<class T>
    void huffman(MinHeap<TreeNode<T>*> heap, int n)
    {
      for(int i=0;i<n-1;i++)
      {
        TreeNode<T> *first = heap.pop();
        TreeNode<T> *second = heap.pop();
        TreeNode<T> *bt = new BinaryTreeNode<T>(first, second, first.data, second.data);
        heap.push(bt);
      }
    }

In my Fundamentals of Data Structures in C++ textbook, it gave a 2 page definition of Huffman coding, and the code above. To me, the book wasn't enough detailed, so I've done the googling and I learned how the process of Huffman coding works. The textbook claims that at the end of the code above, a Huffman tree is made. But to me it seems wrong, because a Huffman tree, is not necessary a complete tree, but the code above seems to always give a complete tree because of the heap.push() . So can someone explain to me how this piece of code is not wrong?

The heap's tree structure does not necessarily match the resulting Huffman tree -- rather, the heap contains a forest of partial Huffman trees, initially each consisting of a single symbol node. The loop then repeatedly takes the two nodes with the least weight, combines them into one node, and puts the resulting combined node back. At the end of the process, the heap contains one finished tree.

Huffman encoding works by taking the two lowest value items at each step. When you first call the function (since your MinHeap is sorted by value) the two lowest value items are popped off and "combined" into a decision node which is then put back into the heap. That node is scored by the sum of its child scores and put back into the heap. Inserting it back into the heap puts it into the right place based on its score; if it's still lower than any other items it'll be first, otherwise it'll be somewhere else.

So this algorithm is building the tree from the bottom up, and by the time you empty the heap you'll have a complete tree. I don't understand what the 'n' is for, though; the loop should be while (heap.size() > 1) . Regardless, the tree is not "full", different branches will be different lengths depending on how the frequencies of the items in the initial heap are scored.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM