消息：hadoop中1字节UTF-8序列的无效字节1

Question

I'm parsing XML using Hadoop, and I got the code from here . 我正在使用Hadoop解析XML，并且从这里获得了代码。

But I'm getting the following error: 但我收到以下错误：

FINISH_TIME="1385387129970" HOSTNAME="DEV140" ERROR="java.io.IOException: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[18,3] Message: Invalid byte 1 of 1-byte UTF-8 sequence. FINISH_TIME =“ 1385387129970” HOSTNAME =“ DEV140” ERROR =“ java.io.IOException：javax.xml.stream.XMLStreamException：[row，col]处的ParseError：[18,3]消息：1字节UTF的无效字节1 -8序列。

But my XML is encoded with UTF-8 only . 但是我的XML仅使用UTF-8编码。 So how can I handle it? 那我该如何处理呢？

Answer 1

I suspect this is the problem - it's at least a problem: 我怀疑这是问题-至少是问题：

XMLStreamReader reader =
    XMLInputFactory.newInstance().createXMLStreamReader(new
        ByteArrayInputStream(document.getBytes()));

That call to getBytes will use the platform default encoding, rather than UTF-8. 对getBytes调用将使用平台默认编码，而不是UTF-8。

You could specify "utf-8" as the encoding name - but it would be simpler to create a StringReader : 您可以指定"utf-8"作为编码名称-但是创建StringReader会更简单：

XMLStreamReader reader = XMLInputFactory.newInstance()
    .createXMLStreamReader(new StringReader(document));

Of course that may not be the only error, but it's at least something to look at. 当然，这可能不是唯一的错误，但至少是要看的东西。

消息：hadoop中1字节UTF-8序列的无效字节1

问题描述

1 个解决方案

解决方案1
1 已采纳 2013-11-25 14:03:57

消息：hadoop中1字节UTF-8序列的无效字节1

问题描述

1 个解决方案

解决方案1 1 已采纳 2013-11-25 14:03:57

解决方案1
1 已采纳 2013-11-25 14:03:57