简体   繁体   English

Java 8 中带有 JAXB 的 UTF-8 字符

[英]UTF-8 characters with JAXB in Java 8

I recently migrated an application for JBoss AS 5 to Wildfly 8, and as such had to move from Java 6 to Java 8.我最近将 JBoss AS 5 的应用程序迁移到 Wildfly 8,因此不得不从 Java 6 迁移到 Java 8。

I'm now encountering a problem when running one of my unit tests through Ant:我现在在通过 Ant 运行我的单元测试之一时遇到问题:

[javac] C:\Users\test\JAXBClassTest.java:123: error: unmappable character for encoding UTF8

Line 123 of the test class is:测试类的第 123 行是:

Assert.assertEquals("Jµhn", JAXBClass.getValue()); 

This test is in place specifically to ensure that the JAXB marshaller can handle UTF-8 characters, which I believe µ is.该测试专门用于确保 JAXB 编组器可以处理 UTF-8 字符,我相信µ是。 I have added a property onto the JAXB marshaller to ensure that these characters are allowed:我在 JAXB 编组器上添加了一个属性,以确保允许使用这些字符:

marshaller.setProperty(Marshaller.JAXB_ENCODING, "UTF-8");

I've seen multiple questions ( 1 , 2 , 3 ) on Stack Overflow which seem to be similar but their answers wither explain why invalid characters which were previously decoded one way are now decoded in another or don't appear to actually have the same issue as me.我已经看到了多个问题( 123堆栈溢出),这似乎是相似的,但他们的答案枯萎解释为什么这在以前解码的一个方法无效字符在另一个正在解码或不会出现真正具有相同像我一样问题。

If all the characters are valid should this cause an issue?如果所有字符都有效,这是否会导致问题? I know I must be missing something but I can't see what.我知道我一定错过了什么,但我看不到什么。

The problem is that in your source code the µ is encoded as \\265 .问题是在您的源代码中, µ被编码为\\265 Which is not valid for UTF-8.这对 UTF-8 无效。 As UTF-8 encoding it is \습 .作为 UTF-8 编码,它是\습

In this source the character encoding for the file is ISO8859.在此源中,文件的字符编码为 ISO8859。

class Latin1 {
    public static void main(String[] args) {
        String s = "µ"; // \265
        System.out.println(s);
    }
}

Which can be compiled with ...可以用...编译

javac -encoding iso8859-1 Scratch.java

... but it fails with UTF-8 encoding ...但它使用 UTF-8 编码失败

javac -encoding UTF-8 Latin1.java
Latin1.java:3: error: unmappable character for encoding UTF-8
        String s = "?";
                    ^

In this source the character encoding for the file is UTF-8.在此源中,文件的字符编码为 UTF-8。

class Utf8 {
    public static void main(String[] args) {
        String s = "µ"; // \uC2B5
        System.out.println(s);
    }
}

Which can be compiled with ISO8859-1 as well with UTF-8.既可以使用 ISO8859-1 编译,也可以使用 UTF-8 编译。

javac -encoding UTF-8 Utf8.java
javac -encoding iso8859-1 Utf8.java

edit In case copy and past from the webpage would alter the encoding.编辑以防从网页复制和过去会改变编码。 Both source files can be created as below, which should make the difference visible.可以按如下方式创建两个源文件,这应该使差异可见。

String latin1 = "class Latin1 {\n"
        + " public static void main(String[] args) {\n"
        + "        String s = \"µ\";\n"
        + "        System.out.println(s);\n"
        + " }\n"
        + "}";
Files.write(Paths.get("Latin1.java"), 
        latin1.getBytes(StandardCharsets.ISO_8859_1));

String utf8 = "class Utf8 {\n"
        + " public static void main(String[] args) {\n"
        + "        String s = \"µ\";\n"
        + "        System.out.println(s);\n"
        + " }\n"
        + "}";
Files.write(Paths.get("Utf8.java"), 
        utf8 .getBytes(StandardCharsets.UTF_8));
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM