简体   繁体   English

JAXB和UTF-8解组异常“ 2字节UTF-8序列的无效字节2”

[英]JAXB & UTF-8 Unmarshal exception “Invalid byte 2 of 2-byte UTF-8 sequence”

I've read a few SO answers that say that JAXB has a bug that it blames on XML's nature which cause it to not work with UTF-8. 我已经读了一些SO答案,它们说JAXB有一个bug归咎于XML的性质,导致它不能与UTF-8一起使用。 My question is, so what is the workaround? 我的问题是,解决方法是什么? I may get unicode character entered by my users copying and pasting into a data field that I need to preserve, marshal, unmarshal, and re-display elsewhere. 我的用户输入的unicode字符可能会复制并粘贴到我需要保留,封送,解组并在其他位置重新显示的数据字段中。

(update) More Context: (更新)更多上下文:

Candidate c = new Candidate();
c.addSubstitution("3 4ths", "\u00BE");
c.addSubstitution("n with tilde", "\u00F1");
    c.addSubstitution("schwa", "\u018F");
    c.addSubstitution("Sigma", "\u03A3");
    c.addSubstitution("Cyrillic Th", "\u040B");     
    jc = JAXBContext.newInstance(Candidate.class);
    Marshaller marshaller = jc.createMarshaller();
    marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, true);
    marshaller.setProperty(Marshaller.JAXB_ENCODING, "UTF-8");
    ByteArrayOutputStream os = new ByteArrayOutputStream();
    marshaller.marshal(c, os);
    String xml = os.toString();
    System.out.println(xml);    
    jc = JAXBContext.newInstance(Candidate.class);
    Unmarshaller jaxb = jc.createUnmarshaller();
    ByteArrayInputStream is = new ByteArrayInputStream(xml.getBytes());
    Candidate newCandidate = (Candidate) jaxb.unmarshal(is);
    for(Substitution s:c.getSubstitutions()) {
        System.out.println(s.getSubstitutionName() + "='" + s.getSubstitutionValue() + "'");
    }

Here's a little test bit I threw together. 这是我放在一起的一点测试。 The exact characters I get are not entirely under my control. 我得到的确切字符并不完全在我的控制之下。 users may paste a N with tilde into the field or whatever. 用户可以将带有波浪号的N粘贴到字段中或其他任何内容。

This is the problem in your test code: 这是您的测试代码中的问题:

ByteArrayInputStream is = new ByteArrayInputStream(xml.getBytes());

You're using the platform default encoding to convert the string to a byte array. 您正在使用平台默认编码将字符串转换为字节数组。 Don't do that. 不要那样做 You've specified that you're going to use UTF-8, so you must do so when you create the byte array: 您已经指定要使用UTF-8,因此在创建字节数组时必须这样做:

ByteArrayInputStream is = new ByteArrayInputStream(xml.getBytes("UTF-8"));

Likewise don't use ByteArrayOutputStream.toString() , which again uses the platform default encoding. 同样,不要使用ByteArrayOutputStream.toString() ,它再次使用平台默认编码。 Indeed, you don't need to convert the output to a string at all: 实际上,您根本不需要将输出转换为字符串:

ByteArrayOutputStream os = new ByteArrayOutputStream();
marshaller.marshal(c, os);
byte[] xml = os.toByteArray();
jc = JAXBContext.newInstance(Candidate.class);
Unmarshaller jaxb = jc.createUnmarshaller();
ByteArrayInputStream is = new ByteArrayInputStream(xml);

This should have no problems with the characters you're using - it will still have problems which can't be represented in XML 1.0 (characters below U+0020 other than \\r , \\n and \\t ) but that's all. 这与您使用的字符应该没有问题-仍然会有XML 1.0无法表示的问题(U + 0020以下的字符, \\r\\n\\t除外),仅此而已。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 2 字节 UTF-8 序列的无效字节 2 - invalid byte 2 of 2-byte UTF-8 sequence MalformedByteSequenceException:2字节UTF-8序列的无效字节2 - MalformedByteSequenceException: Invalid byte 2 of 2-byte UTF-8 sequence 2字节UTF-8序列的无效字节2:如何查找字符 - Invalid byte 2 of 2-byte UTF-8 sequence : How to find the character Android studio 2字节UTF-8序列的无效字节2 - Android studio Invalid byte 2 of 2-byte UTF-8 sequence 嵌套的异常是org.xml.sax.SAXParseException:2字节UTF-8序列的无效字节2 - nested exception is org.xml.sax.SAXParseException: Invalid byte 2 of 2-byte UTF-8 sequence JAXB错误的说明:1字节UTF-8序列的字节1无效 - Explanation of JAXB error: Invalid byte 1 of 1-byte UTF-8 sequence 2 字节 UTF-8 Java 的无效字节 2,序列错误取决于 Windows/IntelliJ - Invalid byte 2 of 2-byte UTF-8 Java, sequence error depending on Windows/IntelliJ 在Windows中使用Java读取UTF-8格式的xml -file会给出“ IOException:2字节UTF-8序列的无效字节2。” -error - Reading xml -file in UTF-8 format in Windows with Java gives “IOException: Invalid byte 2 of 2-byte UTF-8 sequence.” -error 从URL解析RSS给我“ 2字节UTF-8序列的无效字节2” - Parse RSS from URLs gives me “Invalid byte 2 of 2-byte UTF-8 sequence” Selenium Web驱动程序:MalformedByteSequenceException 2字节UTF-8序列的无效字节2 - Selenium Web Driver : MalformedByteSequenceException Invalid byte 2 of 2-byte UTF-8 sequence
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM