简体   繁体   English

霍夫曼编码的时间和空间复杂度是多少?

[英]What is the time and space complexity of Huffman encoding?

I´ve implemented a Huffman encoding algorithm by using (two) Hashmaps to store each unique character´s (stored as keys in the hashmaps) frequency and code (stored as value in the hashmaps). 我已经通过使用(两个)哈希表来存储每个唯一字符(作为哈希表中的键存储)和频率(作为哈希表中的值存储)来实现霍夫曼编码算法。 I am unsure of how I can determine the time and space complexity, can anyone help me find out and explain what these are? 我不确定如何确定时间和空间的复杂性,有人可以帮助我找出并解释这些是什么吗?

Some thoughts I have: Each unique character is stored once in each Hashmap, so is the space complexity O(2*n)=O(n)? 我有一些想法:每个唯一字符在每个Hashmap中存储一次,那么空间复杂度O(2 * n)= O(n)吗?

I´ve read that time complexity is O(nlogn) (for some solutions) but do not understand why that is. 我已经读到时间复杂度为O(nlogn)(对于某些解决方案),但不明白为什么会这样。


First I build a frequency map from a string: 首先,我从一个字符串构建一个频率图:

/**
 * builds a list of the occurring characters in a text and
 * counts the frequency of these characters
 */
public void buildFrequencyMap(String string) {
    frequencyMap = new HashMap<Character, Integer>(); //key, value
    char[] stringArray = string.toCharArray();

    for(char c : stringArray) {
        if(frequencyMap.containsKey(c)) { //if the character has not been stored yet, store it
            frequencyMap.put(c, frequencyMap.get(c) + 1);
        } else {
            frequencyMap.put(c, 1);
        }
    }
}

Then I build the tree: 然后,我构建树:

/**
 * builds the huffman tree
 */
public Node buildTree() {
    PriorityQueue<Node> queue = new PriorityQueue<Node>();

    //fill the queue with nodes constructed from a character and its frequency
    for(char i = 0; i < 256; i++ ) { //256 - size of ASCII alphabet
        if(frequencyMap.containsKey(i) && frequencyMap.get(i) > 0) {
            queue.add(new Node(i, frequencyMap.get(i), null, null)); //create new leaf
        }
    }

    //if the indata only consists of 1 single character
    if(queue.size() == 1) {
        queue.add(new Node('\0', 1, null, null));
    }
    //otherwise
    //continuously merge nodes (two with the lowest frequency) to build the tree until only one node remains --> becomes the root
    while(queue.size() > 1) {
        Node left = queue.poll(); //first extracted child becomes the left child
        Node right = queue.poll(); //second extracted child becomes the right child
        Node parent = new Node('\0', (left.frequency + right.frequency), left, right);
        queue.add(parent);
    }
    return root = queue.poll(); //the remaining node in the queue becomes the root
}

Lastly, the codemap is built: 最后,构建了代码映射:

/**
 * builds the codemap
 */
public void buildCodeMap() {
    codeMap = new HashMap<Character, String>(); //key, value
    buildCode(root, "", codeMap);
}

public void buildCode(Node node, String code, HashMap<Character, String> codeMap) {
    if(!node.isLeaf()) { //if the current node is NOT a leaf
        buildCode(node.leftChild, code + '0', codeMap); //each time we go down at the LEFT side of the tree, encode a 0
        buildCode(node.rightChild, code + '1', codeMap); //each time we go down at the RIGHT side of the tree, encode a 1
    } else { //otherwise
        codeMap.put(node.character, code);
    }
}

Huffman coding takes O( n log n ) time, unless the frequencies are already sorted, in which case it takes O( n ) time. 霍夫曼编码需要O( n log n )时间,除非已经对频率进行了排序,在这种情况下,它需要O( n )时间。 n is the number of symbols being coded. n是要编码的符号数。

Note that at least one of the operations of insertion, finding the minimum, or deleting it from a Priority Queue is O(log n ). 注意,插入,查找最小值或从优先级队列中删除最小值的至少一项操作是O(log n )。 Which one depends on the implementation of the Priority Queue. 哪一个取决于优先级队列的实现。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM