[英]Message: Invalid byte 1 of 1-byte UTF-8 sequence in hadoop
I'm parsing XML using Hadoop, and I got the code from here . 我正在使用Hadoop解析XML,并且从这里获得了代码。
But I'm getting the following error: 但我收到以下错误:
FINISH_TIME="1385387129970" HOSTNAME="DEV140" ERROR="java.io.IOException: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[18,3] Message: Invalid byte 1 of 1-byte UTF-8 sequence.
FINISH_TIME =“ 1385387129970” HOSTNAME =“ DEV140” ERROR =“ java.io.IOException:javax.xml.stream.XMLStreamException:[row,col]处的ParseError:[18,3]消息:1字节UTF的无效字节1 -8序列。
But my XML is encoded with UTF-8 only . 但是我的XML仅使用UTF-8编码。 So how can I handle it?
那我该如何处理呢?
I suspect this is the problem - it's at least a problem: 我怀疑这是问题-至少是问题:
XMLStreamReader reader =
XMLInputFactory.newInstance().createXMLStreamReader(new
ByteArrayInputStream(document.getBytes()));
That call to getBytes
will use the platform default encoding, rather than UTF-8. 对
getBytes
调用将使用平台默认编码,而不是UTF-8。
You could specify "utf-8"
as the encoding name - but it would be simpler to create a StringReader
: 您可以指定
"utf-8"
作为编码名称-但是创建StringReader
会更简单:
XMLStreamReader reader = XMLInputFactory.newInstance()
.createXMLStreamReader(new StringReader(document));
Of course that may not be the only error, but it's at least something to look at. 当然,这可能不是唯一的错误,但至少是要看的东西。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.