[英]Why FileInputStream readline returns null when I set encoding to UTF-16?
It works fine with UTF-8
, and it also works fine with UTF-16
if I use different file. 如果使用其他文件,它可以与
UTF-8
,也可以与UTF-16
使用。
BufferedReader br = new BufferedReader(new InputStreamReader(new
FileInputStream(filePath), "UTF-16"));
If I replace UTF-16
with UTF-8
in above code, everything works as expected, why is that? 如果我在上面的代码中将
UTF-16
替换为UTF-8
,那么一切都会按预期进行,这是为什么呢?
Suggested answer is different because I just need to read the file. 建议的答案有所不同,因为我只需要阅读文件。 Answer was simple, I can't read UTF-16 if the file is UTF-8.
答案很简单,如果文件是UTF-8,我将无法读取UTF-16。
Check the encoding of your files. 检查文件的编码。 UTF-16 can be encoded using Big Endian (UTF-16BE) or Little Endian (UTF-16LE).
可以使用Big Endian(UTF-16BE)或Little Endian(UTF-16LE)对UTF-16进行编码。 These are different.
这些是不同的。
This code works for four variants of the same file. 此代码适用于同一文件的四个变体。
import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStreamReader;
import java.nio.charset.Charset;
public class SOPlayground {
public static void main(String[] args) throws Exception {
readAndPrint("/tmp/u-8.txt", Charset.forName("UTF-8"));
readAndPrint("/tmp/u-16.txt", Charset.forName("UTF-16"));
readAndPrint("/tmp/u-16le.txt", Charset.forName("UTF-16LE"));
readAndPrint("/tmp/u-16be.txt", Charset.forName("UTF-16BE"));
}
private static void readAndPrint(String filePath, final Charset charset) throws IOException, FileNotFoundException {
final BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(filePath), charset));
String line = br.readLine();
while (line != null) {
System.out.println(line);
line = br.readLine();
}
}
}
On GNU/Linux you can check the encoding using the file
tool: 在GNU / Linux上,您可以使用
file
工具检查编码:
/tmp % file u*.txt
u-16be.txt: data
u-16le.txt: data
u-16.txt: Little-endian UTF-16 Unicode text, with no line terminators
u-8.txt: UTF-8 Unicode text
The content of these files are all different: 这些文件的内容都是不同的:
/tmp % cat u*.txt
����
����
������
üäöü
But using the above Java code, they can be read correctly. 但是使用上面的Java代码,可以正确读取它们。 The output of my Java code is:
我的Java代码的输出是:
üäöü
üäöü
üäöü
üäöü
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.