简体   繁体   中英

Java writing byte into a .txt file

I am practicing the huffman encoding from my programming class. I have done the almost all the encoding part. For example, I have assigned each character a code (ie a=100100) and convert each char in the text according to it's code. Then I parse each code into a List of Byte, like parsing 100100 into a Byte and store it into the List. However, I need to write all the Bytes into a .txt file. I realized there is a problem.

Example: one character has the code "1001" and it will be written into the .txt file as 1 byte instead of just 4 bits.

I know that after huffman encoding, characters are stored in a format like: "11100111101011111101011011111000010000101" but now my situation is each character take 1 byte of size, which has no different in size with the original input file before encoding.

Is there any way to store the code in the format like "11100111101011111101011011111000010000101"?

Sorry for my English, I tried my best to explain my confusion.

try (FileWriter fw = new FileWriter("out.txt")) {
    try (BufferedWriter bfw = new BufferedWriter(fw)) {
        char[] buffer = str.toCharArray();
        for (int i = 0; i < buffer.length; i++) {
            bfw.write(Integer.valueOf(Byte.valueOf((byte) buffer[i]).intValue()).toBinaryString());
        }       
    }
}

You could use a BitSet object if you intend to keep all bits in memory.

BitSet bits = new BitSet();
bits.set(7000, true);
if (bits.get(7000)) { ... }
byte[] bytes = bits.toByteArray();

Path path = Paths.get("C:/Temp/huffman.bin");
Files.writeBytes(path, bytes);

Using bytes immediately is feasible.

However you cannot write char's; there is a conversion which messes things up. Mind char is 16 bits UTF-16 formatted to contain Unicode.

This writes binary data, not text.

For trailing bits, I do not know how Huffman deals with that, do a bit of research; I think bits 0 will do and not generate artifacts. Maybe add the first 0-7 bits of longer code. Padding is the key word.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM