[英]Java writing byte into a .txt file
I am practicing the huffman encoding from my programming class. 我正在从我的编程课练习霍夫曼编码。 I have done the almost all the encoding part.
我已经完成了几乎所有的编码部分。 For example, I have assigned each character a code (ie a=100100) and convert each char in the text according to it's code.
例如,我为每个字符分配了一个代码(即a = 100100),并根据其代码在文本中转换了每个字符。 Then I parse each code into a List of Byte, like parsing 100100 into a Byte and store it into the List.
然后,我将每个代码解析为字节列表,就像将100100解析为字节并将其存储到列表中一样。 However, I need to write all the Bytes into a .txt file.
但是,我需要将所有字节写入.txt文件。 I realized there is a problem.
我意识到有问题。
Example: one character has the code "1001" and it will be written into the .txt file as 1 byte instead of just 4 bits. 示例:一个字符的代码为“ 1001”,它将以1字节而不是4位的形式写入.txt文件。
I know that after huffman encoding, characters are stored in a format like: "11100111101011111101011011111000010000101" but now my situation is each character take 1 byte of size, which has no different in size with the original input file before encoding. 我知道在霍夫曼编码后,字符以以下格式存储:“ 11100111101011111101101011011111000010000101”,但是现在我的情况是每个字符都占用1个字节的大小,该大小与编码之前的原始输入文件没有什么不同。
Is there any way to store the code in the format like "11100111101011111101011011111000010000101"? 有什么方法可以将代码存储为“ 11100111101011011111101011011111000010000101”这样的格式?
Sorry for my English, I tried my best to explain my confusion. 对不起我的英语,我尽力解释我的困惑。
try (FileWriter fw = new FileWriter("out.txt")) {
try (BufferedWriter bfw = new BufferedWriter(fw)) {
char[] buffer = str.toCharArray();
for (int i = 0; i < buffer.length; i++) {
bfw.write(Integer.valueOf(Byte.valueOf((byte) buffer[i]).intValue()).toBinaryString());
}
}
}
You could use a BitSet
object if you intend to keep all bits in memory. 如果打算将所有位保留在内存中,则可以使用
BitSet
对象。
BitSet bits = new BitSet();
bits.set(7000, true);
if (bits.get(7000)) { ... }
byte[] bytes = bits.toByteArray();
Path path = Paths.get("C:/Temp/huffman.bin");
Files.writeBytes(path, bytes);
Using bytes immediately is feasible. 立即使用字节是可行的。
However you cannot write char's; 但是你不能写char的。 there is a conversion which messes things up.
有一个转换使事情变得混乱。 Mind char is 16 bits UTF-16 formatted to contain Unicode.
介意char是16位UTF-16格式,包含Unicode。
This writes binary data, not text. 这将写入二进制数据,而不是文本。
For trailing bits, I do not know how Huffman deals with that, do a bit of research; 对于尾随的位,我不知道霍夫曼是如何处理的,请进行一些研究。 I think bits 0 will do and not generate artifacts.
我认为位0将起作用并且不会产生伪像。 Maybe add the first 0-7 bits of longer code.
也许添加较长代码的前0-7位。 Padding is the key word.
填充是关键词。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.