Java使用哪种编码从给出的unicode数据创建字符串？

Question

I am quite perplexed on why I should not be encoding unicode text with UTF-8 for comparison when other text(to compare) has been encoded with UTF-8? 我很困惑为什么当其他文本（要比较）已经用UTF-8编码时，为什么不应该使用UTF-8编码unicode文本进行比较呢？

I wanted to compare a text(= アクセス拒否 - means Access denied) stored in external file encoded as UTF-8 with a constant string stored in a .java file as 我想将存储在编码为UTF-8的外部文件中的文本（=アクセス拒绝否-表示拒绝访问）与存储在.java文件中的常量字符串进行比较。

public static final String ACCESS_DENIED_IN_JAPANESE = "\u30a2\u30af\u30bb\u30b9\u62d2\u5426"; // means Access denied

The java file was encoded as Cp1252. 该Java文件被编码为Cp1252。

I read the file as as input stream by using below code. 我通过使用以下代码将文件作为输入流读取。 Point to note that I am using UTF-8 for encoding. 请注意，我正在使用UTF-8进行编码。

 InputStream in = new FileInputStream("F:\\sample.txt");
        int b1; 
        byte[] bytes = new byte[4096];
        int i = 0;
        while (true) {
            b1 = in.read();
            if (b1 == -1)
                break;
            bytes[i++] = (byte) b1;
        }

        String japTextFromFile = new String(bytes, 0, i, Charset.forName("UTF-8"));

Now when I compare as 现在当我比较为

System.out.println(ACCESS_DENIED_IN_JAPANESE.equals(japTextFromFile));  // result is `true` , and works fine

but when I encode ACCESS_DENIED_IN_JAPANESE with UTF-8 and try to compare it with japTextFromFile result is false . 但是当我使用UTF-8编码ACCESS_DENIED_IN_JAPANESE并尝试与japTextFromFile进行比较时，结果为false 。 The code is 该代码是

String encodedAccessDenied = new String(ACCESS_DENIED_IN_JAPANESE.getBytes(),Charset.forName("UTF-8"));

System.out.println(encodedAccessDenied .equals(japTextFromFile));  // result is `false`

So my doubt is why above comparison is failing, when both the strings are same and have been encoded with UTF-8? 因此，我的疑问是，当两个字符串相同并且已使用UTF-8编码时，为什么上面的比较失败了？ The result should be true . 结果应该是true 。

However, in first case, when compared different encoded strings- one with UTF-16(Java default way of encoding string) and other with UTF-8 , result is true , which I think should be false as it is different encoding ,no matter text we read, is same. 但是，在第一种情况下，当将不同的编码字符串（一个使用UTF-16（Java默认的编码字符串方式），另一个使用UTF-8）进行比较时，结果为true ，我认为应该是false因为它是不同的编码，无论我们阅读的文字是相同的。

Where I am wrong in my understanding? 我的理解哪里错了？ Any clarification is greatly appreciated. 任何澄清是极大的赞赏。

Answer 1

ACCESS_DENIED_IN_JAPANESE.getBytes() does not use UTF-8. ACCESS_DENIED_IN_JAPANESE.getBytes()不使用UTF-8。 It uses your platform's default charset. 它使用平台的默认字符集。 But then you use UTF-8 to turn those bytes back into a String. 但是，然后您使用UTF-8将那些字节变回字符串。 This gets you a different String to the one you started with. 这为您提供了与开始时不同的String。

Try this: 尝试这个：

String encodedAccessDenied = new String(ACCESS_DENIED_IN_JAPANESE.getBytes(StandardCharsets.UTF_8),StandardCharsets.UTF_8
);

System.out.println(encodedAccessDenied .equals(japTextFromFile));  // result is `true`

Answer 2

The best way I know is put all static texts into a text file encoded with UTF-8. 我知道的最好方法是将所有静态文本放入使用UTF-8编码的文本文件中。 And then read those resources with FileReader , setting encoding parameter to "UTF-8" 然后使用FileReader读取这些资源，将编码参数设置为“ UTF-8”

Java使用哪种编码从给出的unicode数据创建字符串？

问题描述

2 个解决方案

解决方案1
2 2015-10-01 19:12:09

解决方案2
0 2015-10-01 19:19:32

Java使用哪种编码从给出的unicode数据创建字符串？

问题描述

2 个解决方案

解决方案1 2 2015-10-01 19:12:09

解决方案2 0 2015-10-01 19:19:32

解决方案1
2 2015-10-01 19:12:09

解决方案2
0 2015-10-01 19:19:32