简体   繁体   中英

FileInputStream and Huffman Tree

I am creating a Huffman tree to compress a text file but I am having some issues. This method I am making is supposed to take a FileInputStream which inputs the text data and returns a Map of the characters and the counts. However, to do that, I need to define the size of byte[] to store the data. The problem is that the byte[] array size needs to be just the right length or else the Map will also have some unneeded data. Is there a way to make the byte[] just the right size?

Here is my code:

// provides a count of characters in an input file and place in map
public static Map<Character, Integer> getCounts(FileInputStream input)
        throws IOException {
    Map<Character, Integer> output = new TreeMap<Character, Integer>(); // treemap keeps keys in sorted order (chars alphabetized)
    byte[] fileContent = new byte[100]; // creates a byte[]
    //ArrayList<Byte> test = new ArrayList<Byte>();
    input.read(fileContent);                // reads the input into fileContent
    String test = new String(fileContent);  // contains entire file into this string to process

    // goes through each character of String to put chars as keys and occurrences as keys
    for (int i = 0; i < test.length(); i++) {
        char temp = test.charAt(i);
        if (output.containsKey(temp)) { // seen this character before; increase count

            int count = output.get(temp);
            System.out.println("repeat; char is: " + temp + "count is: " + count);
            output.put(temp, count + 1);
        } else {                        // Haven't seen this character before; create count of 1
            System.out.println("new; char is: " + temp + "count is: 1");
            output.put(temp, 1);
        }
    }
    return output;
}

The return value of FileInputStream.read() is the number of bytes actually read, or -1 in case of EOF. You can use this value instead of test.length() in the for loop.

Notice that read() is not guaranteed to read in the buffer length worth of bytes, even if the end of file is not reached, so it is usually used in a loop:

int bytesRead;

//Read until there is no more bytes to read.
while((bytesRead = input.read(buf))!=-1)
{
    //You have next bytesRead bytes in a buffer here      
} 

Finally, if your strings are Unicode, this approach will not work, since read() can terminate mid-character. Consider using InputStreamReader to wrap FileInputStream :

Reader fileReader = new InputStreamReader(input, "UTF-8");

int charsRead;
char buf[] = new char[256];

while ((charsRead = fileReader.read(buf)) > 0) {
   //You have charsRead characters in a buffer here
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM