Java中从CP1250到UTF-8的错误/奇怪编码文件

Question

I have problem with right encoding file from CP1250 to UTF-8. 我有正确的编码文件从CP1250到UTF-8的问题。 Almost all characters are converted correctly, but characters "ň" and "Ř" not (has "?" char"). 几乎所有字符都正确转换，但是字符“ň”和“Ř”却没有正确转换（具有“？”字符”）。

At Netbeans I set UTF-8 encoding for project. 在Netbeans，我为项目设置了UTF-8编码。

Test string in the file can be "skříň SKŘÍŇ". 文件中的测试字符串可以是“skříňSKŘÍŇ”。 Output at console: "skĹ™ĂĹ? SKĹ?ĂŤĹ‡". 在控制台上输出：“skĹ™ĂĹ？SKĹ？ĂŤĹ‡”。 Output differs from converting, for example, in PHP. 输出与转换（例如，在PHP中）不同。 I'm in the end. 我最后。

My code: 我的代码：

BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream("file-cp1250.txt"), "CP1250"));
while ((line = br.readLine()) != null) {
  line = new String(line.getBytes("UTF-8"), "CP1250");
  System.out.println(line);
}

Thanks for advices 感谢您的建议

Answer 1

The following would be principally correct: 以下原则上是正确的：

BufferedReader br = new BufferedReader(
    new InputStreamReader(new FileInputStream("file-cp1250.txt"), "CP1250"));
while ((line = br.readLine()) != null) {
    System.out.println(line);
}

That is the binary data of the InputStream is specified as being Windows/Code Page 1250, and is read with decoding. 也就是说，InputStream的二进制数据被指定为Windows /代码页1250，并通过解码读取。 Java String always hold Unicode (so it can combine all scripts). Java String始终保留Unicode（因此它可以合并所有脚本）。

However System.out is in general the platform dependent console, and that might just not be Cp1250, but something else. 但是， System.out通常是依赖于平台的控制台，可能不是Cp1250，而是其他东西。 The Unicode might be converted to Cp1252, Microsofts Latin-1. Unicode可能会转换为Microsoft的Latin-1 Cp1252。 And then one is thinking of having some bug. 然后，人们想到了一些错误。 Where System.out simply cannot be used. 无法使用System.out的地方。

Java中从CP1250到UTF-8的错误/奇怪编码文件

问题描述

1 个解决方案

解决方案1
3 已采纳 2017-08-27 20:39:45

Java中从CP1250到UTF-8的错误/奇怪编码文件

问题描述

1 个解决方案

解决方案1 3 已采纳 2017-08-27 20:39:45

解决方案1
3 已采纳 2017-08-27 20:39:45