简体   繁体   English

JAXB错误的说明:1字节UTF-8序列的字节1无效

[英]Explanation of JAXB error: Invalid byte 1 of 1-byte UTF-8 sequence

We're parsing an XML document using JAXB and get this error: 我们正在使用JAXB解析XML文档并收到此错误:

[org.xml.sax.SAXParseException: Invalid byte 1 of 1-byte UTF-8 sequence.]
at javax.xml.bind.helpers.AbstractUnmarshallerImpl.createUnmarshalException(AbstractUnmarshallerImpl.java:315)

What exactly does this mean and how can we resolve this?? 这究竟是什么意思,我们如何解决这个问题?

We are executing the code as: 我们正在执行以下代码:

jaxbContext = JAXBContext.newInstance(Results.class);
Unmarshaller unmarshaller = jaxbContext.createUnmarshaller();
unmarshaller.setSchema(getSchema());
results = (Results) unmarshaller.unmarshal(new FileInputStream(inputFile));

Update 更新

Issue appears to be due to this "funny" character in the XML file: ¿ 问题似乎是由于XML文件中的这个“有趣”字符: ¿

Why would this cause such a problem?? 为什么会导致这样的问题?

Update 2 更新2

There are two of those weird characters in the file. 文件中有两个奇怪的字符。 They are around the middle of the file. 它们位于文件的中间。 Note that the file is created based on data in a database and those weird characters somehow got into the database. 请注意,该文件是基于数据库中的数据创建的,并且这些奇怪的字符以某种方式进入数据库。

Update 3 更新3

Here is the full XML snippet: 这是完整的XML代码段:

<Description><![CDATA[Mt. Belvieu ¿ Texas]]></Description>

Update 4 更新4

Note that there is no <?xml ...?> header. 请注意,没有<?xml ...?>标头。

The HEX for the special character is BF 特殊字符的HEX是BF

So, you problem is that JAXB treats XML files without <?xml ...?> header as UTF-8, when your file uses some other encoding (probably ISO-8859-1 or Windows-1252, if 0xBF character actually intended to mean ¿ ). 所以,你的问题是,当你的文件使用其他编码时,JAXB将没有<?xml ...?>标题的XML文件视为UTF-8(可能是ISO-8859-1或Windows-1252,如果0xBF字符实际上是为了意思是¿ )。

If you can change the producer of the file, you may add <?xml ...?> header with actual encoding specification, or just use UTF-8 to write a file. 如果您可以更改文件的生产者,可以使用实际编码规范添加<?xml ...?>标头,或者只使用UTF-8编写文件。

If you can't change the producer, you have to use InputStreamReader with explicit encoding specification, because (unfortunately) JAXB don't allow to change its default encoding: 如果您无法更改生成器,则必须使用具有显式编码规范的InputStreamReader ,因为(遗憾的是)JAXB不允许更改其默认编码:

results = (Results) unmarshaller.unmarshal(
   new InputStreamReader(new FileInputStream(inputFile), "ISO-8859-1")); 

However, this solution is fragile - it fails on input files with <?xml ...?> header with different encoding specification. 但是,这个解决方案很脆弱 - 它使用带有不同编码规范的<?xml ...?>标头的输入文件失败。

That's probably a Byte Order Mark (BOM) , and is a special byte sequence at the start of a UTF file. 这可能是字节顺序标记(BOM) ,并且是UTF文件开头的特殊字节序列。 They are, frankly, a pain in the arse, and seem particularly common when interacting with .net systems. 坦率地说,它们是屁股中的痛苦,在与.net系统交互时似乎特别常见。

Try rephrasing your code to use a Reader rather than an InputStream : 尝试重新编写代码以使用Reader而不是InputStream

results = (Results) unmarshaller.unmarshal(new FileReader(inputFile));

A Reader is UTF-aware, and might make a better stab at it. Reader可以Reader UTF,并且可以更好地刺激它。 More simply, pass the File directly to the Unmarshaller , and let the JAXBContext worry about it: 更简单地说,将File直接传递给Unmarshaller ,让JAXBContext担心它:

results = (Results) unmarshaller.unmarshal(inputFile);

It sounds as if your XML is encoded with UTF-16 but that encoding is not getting passed to the Unmarshaller. 听起来好像你的XML是用UTF-16编码的,但是这个编码没有传递给Unmarshaller。 With the Marshaller you can set that using marshaller.setProperty(Marshaller.JAXB_ENCODING, "UTF-16"); 使用Marshaller你可以使用marshaller.setProperty(Marshaller.JAXB_ENCODING, "UTF-16"); but because the Unmarshaller is not required to support any properties, I am not sure how to enforce that other than ensuring your XML document has encoding="UTF-16" in the initial <?xml?> element. 但是因为Unmarshaller不需要支持任何属性,所以除了确保您的XML文档在初始<?xml?>元素中具有encoding="UTF-16"之外,我不确定如何强制执行。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 “ 1字节UTF-8序列的无效字节1”错误 - “Invalid byte 1 of 1-byte UTF-8 sequence” error MalformedByteSequenceException:1字节UTF-8序列的无效字节1 - MalformedByteSequenceException: Invalid byte 1 of 1-byte UTF-8 sequence 消息:hadoop中1字节UTF-8序列的无效字节1 - Message: Invalid byte 1 of 1-byte UTF-8 sequence in hadoop 如何修复 1 字节 UTF-8 序列的无效字节 1 - How to fix Invalid byte 1 of 1-byte UTF-8 sequence MalformedByteSequenceException 1 字节 UTF-8 序列的字节 1 无效 - MalformedByteSequenceException Invalid byte 1 of 1-byte UTF-8 sequence JRException:1字节UTF-8序列的无效字节1 - JRException: Invalid byte 1 of 1-byte UTF-8 sequence getResponseBodyAsStream返回“ 1字节UTF-8序列的无效字节1” - getResponseBodyAsStream returns “Invalid byte 1 of 1-byte UTF-8 sequence” 将XMI文件导入XML项目错误:1字节utf-8序列的无效字节1 - Import an XMI file to a XML project Error : Invalid byte 1 of 1-byte utf-8 sequence 我有UTF-8-但仍然收到“ 1字节UTF-8序列的无效字节1” - I have UTF-8 - but still get “Invalid byte 1 of 1-byte UTF-8 sequence” 如何删除XML中的特殊字符,并且在读取此xml文件时不应导致错误“1字节UTF-8序列的无效字节1” - How to remove the special characters in XML and should not lead to the error “Invalid byte 1 of 1-byte UTF-8 sequence” while reading this xml file
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM