简体   繁体   English

使用Stax解析XML文件时出现Unicode(0xb)错误

[英]Unicode(0xb) error while parsing an XML file using Stax

While parsing an XML file Stax produces an error: 在解析XML文件时,Stax会产生错误:

Unicode(0xb) error-An invalid XML character (Unicode: 0xb) was found in the element content of the document. Unicode(0xb)错误 - 在文档的元素内容中找到了无效的XML字符(Unicode:0xb)。

Just click on the link below with the xml line with special character as "VI". 只需点击下面的链接,使用带有特殊字符“x”的xml行。 It's not an alphabetical character: when you try to copy and paste it in Notepad, you will get it as some symbol. 它不是一个字母字符:当你尝试将它复制并粘贴到记事本中时,你会将它作为一些符号。 I have tried parsing it using Stax. 我尝试使用Stax解析它。 It was showing the above-mentioned error. 它显示出上述错误。

在此输入图像描述

Please can somebody give me a solution for this? 请有人能给我一个解决方案吗?

Thanks in advance. 提前致谢。

0xB (vertical tab) is not a valid character in XML. 0xB(垂直制​​表符)不是XML中的有效字符。 The only valid characters before ASCII 32 (0x20, space) are 0x9 (tab), 0xA (carriage return) and 0xD (line feed). ASCII 32(0x20,空格)之前唯一有效的字符是0x9(制表符),0xA(回车符)和0xD(换行符)。

In short, what you are trying to parse is NOT XML. 简而言之,您要解析的内容不是XML。

According to the XML W3C Recommendation 0xb is not allowed in an XML file: 根据XML W3C建议 ,XML文件中不允许使用0xb:

Character Range [2] Char ::= #x9 | 字符范围[2]字符:: =#x9 | #xA | #xA | #xD | #xD | [#x20-#xD7FF] | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. [#x10000-#x10FFFF] / *任何Unicode字符,不包括代理块,FFFE和FFFF。 */ * /

So strictly speaking your input file is not an XML file. 严格来说,您的输入文件不是XML文件。

Whenever invalid xml character comes xml, it gives such error. 每当无效的xml字符出现xml时,它就会出现这样的错误。 When u open it in notepad++ it look like VT, SOH,FF like these are invalid xml chars. 当你在记事本++中打开它时,它看起来像VT,SOH,FF这些都是无效的xml字符。 I m using xml version 1.0 and i validate text data before entering it in database by pattern 我使用xml版本1.0,我在模式中输入数据库之前验证文本数据

Pattern p = Pattern.compile("[^\u0009\u000A\u000D\u0020-\uD7FF\uE000-\uFFFD\u10000-\u10FFF]+");
retunContent = p.matcher(retunContent).replaceAll("");

It will ensure that no invalid special char will enter in xml 它将确保在xml中不会输入无效的特殊字符

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM