Java：替换字符串中缺少的 Unicode 符号？

Question

I have a rather straightforward question.我有一个比较直接的问题。 When I read a string from a stream, all of the letters are fine except symbols.当我从流中读取字符串时，除了符号之外，所有字母都很好。 For example, if I tried to read a username that has the ™ or the © symbol in it, the symbols print out as: â„¢ and Â©, respectively.例如，如果我尝试读取包含 ™ 或 © 符号的用户名，则这些符号将分别打印为： ¢ 和 Â©。 I thought that Java supported all of the Unicode characters.我认为 Java 支持所有 Unicode 字符。 How can I get the symbols to be printed out correctly?如何正确打印符号？

Is there a special type of string that I could use, or perhaps another solution to this problem?是否有我可以使用的特殊类型的字符串，或者这个问题的另一种解决方案？

Answer 1

When reading from a stream, eg using从流中读取时，例如使用

InputStreamReader reader = new InputStreamReader(stream);

You tell java to use the platform encoding.您告诉 java 使用平台编码。 This may not (in fact at least 50% of the time given how often windows pcs appear) be a Unicode encoding这可能不是（事实上至少有 50% 的时间考虑到 windows pc 出现的频率）是 Unicode 编码

You need to specify the encoding of the byte stream, eg您需要指定字节流的编码，例如

InputStreamReader reader = new InputStreamReader(stream, charset);

Or或者

InputStreamReader reader = new InputStreamReader(stream, "UTF-8");

If using the charset name rather than a Charset instance如果使用字符集名称而不是字符集实例

Answer 2

Based on the character examples you are giving, I believe you are reading in the characters correctly.根据您提供的字符示例，我相信您正确阅读了字符。 For example, the copyright character is Unicode A9.例如，版权字符是 Unicode A9。 When you write it out in UTF-8 however, it will be serialized as 2 bytes: C2 followed by A9.但是，当您用 UTF-8 写出它时，它将被序列化为 2 个字节：C2 后跟 A9。 See http://www.fileformat.info/info/unicode/char/a9/index.htm见http://www.fileformat.info/info/unicode/char/a9/index.htm

If your output device expects data in UTF-8 format all will be well.如果您的输出设备需要 UTF-8 格式的数据，一切都会好起来的。 However since you are seeing Â©, I believe your output device expects data in ISO-8859-1 (see http://en.wikipedia.org/wiki/ISO/IEC_8859-1 ) so you have a mismatch.但是，由于您看到的是 ©，我相信您的输出设备需要 ISO-8859-1 中的数据（请参阅http://en.wikipedia.org/wiki/ISO/IEC_8859-1 ），因此您不匹配。 The output device interprets the C2 as Â and the A9 as ©.输出设备将 C2 解释为 Â，将 A9 解释为 ©。

To fix this in code (without changing your output device) you need to create an print stream that will use the ISO-8859-1 character encoding when it converts your Unicode characters to a byte stream.要在代码中解决此问题（不更改输出设备），您需要创建一个打印流，该流在将 Unicode 字符转换为字节流时将使用 ISO-8859-1 字符编码。 For example:例如：

public static void main (String [] args) throws Exception
{
    // use default character encoding
    String s = "copyright is ©";
    System.out.println(s);

    // create a new stream with a different encoding
    PrintStream out = new PrintStream(System.out, true, "ISO-8859-1");
    out.println(s);
}

In my case the first println looks good because the IDE console window has UTF-8 encoding and the second one looks bogus.在我的例子中，第一个 println 看起来不错，因为 IDE 控制台窗口具有 UTF-8 编码，而第二个看起来是假的。 In your case the first line should be bad (showing two characters where the copyright symbol should be) and the second one should show the correct copyright character.在你的情况下，第一行应该是坏的（在版权符号应该出现的地方显示两个字符），第二行应该显示正确的版权字符。

Java：替换字符串中缺少的 Unicode 符号？

问题描述

2 个解决方案

解决方案1
2 已采纳 2012-09-19 00:09:19

解决方案2
0 2012-09-19 00:40:23

Java：替换字符串中缺少的 Unicode 符号？

问题描述

2 个解决方案

解决方案1 2 已采纳 2012-09-19 00:09:19

解决方案2 0 2012-09-19 00:40:23

解决方案1
2 已采纳 2012-09-19 00:09:19

解决方案2
0 2012-09-19 00:40:23