Java将字节写入.txt文件

Question

I am practicing the huffman encoding from my programming class. 我正在从我的编程课练习霍夫曼编码。 I have done the almost all the encoding part. 我已经完成了几乎所有的编码部分。 For example, I have assigned each character a code (ie a=100100) and convert each char in the text according to it's code. 例如，我为每个字符分配了一个代码（即a = 100100），并根据其代码在文本中转换了每个字符。 Then I parse each code into a List of Byte, like parsing 100100 into a Byte and store it into the List. 然后，我将每个代码解析为字节列表，就像将100100解析为字节并将其存储到列表中一样。 However, I need to write all the Bytes into a .txt file. 但是，我需要将所有字节写入.txt文件。 I realized there is a problem. 我意识到有问题。

Example: one character has the code "1001" and it will be written into the .txt file as 1 byte instead of just 4 bits. 示例：一个字符的代码为“ 1001”，它将以1字节而不是4位的形式写入.txt文件。

I know that after huffman encoding, characters are stored in a format like: "11100111101011111101011011111000010000101" but now my situation is each character take 1 byte of size, which has no different in size with the original input file before encoding. 我知道在霍夫曼编码后，字符以以下格式存储：“ 11100111101011111101101011011111000010000101”，但是现在我的情况是每个字符都占用1个字节的大小，该大小与编码之前的原始输入文件没有什么不同。

Is there any way to store the code in the format like "11100111101011111101011011111000010000101"? 有什么方法可以将代码存储为“ 11100111101011011111101011011111000010000101”这样的格式？

Sorry for my English, I tried my best to explain my confusion. 对不起我的英语，我尽力解释我的困惑。

Answer 1

try (FileWriter fw = new FileWriter("out.txt")) {
    try (BufferedWriter bfw = new BufferedWriter(fw)) {
        char[] buffer = str.toCharArray();
        for (int i = 0; i < buffer.length; i++) {
            bfw.write(Integer.valueOf(Byte.valueOf((byte) buffer[i]).intValue()).toBinaryString());
        }       
    }
}

Answer 2

You could use a BitSet object if you intend to keep all bits in memory. 如果打算将所有位保留在内存中，则可以使用BitSet对象。

BitSet bits = new BitSet();
bits.set(7000, true);
if (bits.get(7000)) { ... }
byte[] bytes = bits.toByteArray();

Path path = Paths.get("C:/Temp/huffman.bin");
Files.writeBytes(path, bytes);

Using bytes immediately is feasible. 立即使用字节是可行的。

However you cannot write char's; 但是你不能写char的。 there is a conversion which messes things up. 有一个转换使事情变得混乱。 Mind char is 16 bits UTF-16 formatted to contain Unicode. 介意char是16位UTF-16格式，包含Unicode。

This writes binary data, not text. 这将写入二进制数据，而不是文本。

For trailing bits, I do not know how Huffman deals with that, do a bit of research; 对于尾随的位，我不知道霍夫曼是如何处理的，请进行一些研究。 I think bits 0 will do and not generate artifacts. 我认为位0将起作用并且不会产生伪像。 Maybe add the first 0-7 bits of longer code. 也许添加较长代码的前0-7位。 Padding is the key word. 填充是关键词。

Java将字节写入.txt文件

问题描述

2 个解决方案

解决方案1
0 2016-05-06 10:42:59

解决方案2
0 2016-05-06 10:44:17

Java将字节写入.txt文件

问题描述

2 个解决方案

解决方案1 0 2016-05-06 10:42:59

解决方案2 0 2016-05-06 10:44:17

解决方案1
0 2016-05-06 10:42:59

解决方案2
0 2016-05-06 10:44:17