简体   繁体   English

2 字节 UTF-8 序列的无效字节 2

[英]invalid byte 2 of 2-byte UTF-8 sequence

I am trying to parse an XML file with <?version = 1.0, encoding = UTF-8> but ran into an error message invalid byte 2 of 2-byte UTF-8 sequence .我正在尝试使用<?version = 1.0, encoding = UTF-8>解析 XML 文件,但遇到错误消息invalid byte 2 of 2-byte UTF-8 sequence Does anybody know what caused this problem?有谁知道是什么导致了这个问题?

Most commonly it's due to feeding ISO-8859-x (Latin-x, like Latin-1) but parser thinking it is getting UTF-8 .最常见的是由于提供ISO-8859-x (Latin-x,如 Latin-1)但解析器认为它正在获取UTF-8 Certain sequences of Latin-1 characters (two consecutive characters with accents or umlauts) form something that is invalid as UTF-8 , and specifically such that based on first byte, second byte has unexpected high-order bits.某些 Latin-1 字符序列(两个带有重音或变音符号的连续字符)形成了一些作为UTF-8无效的东西,特别是基于第一个字节,第二个字节具有意外的高位。

This can easily occur when some process dumps out XML using Latin-1, but either forgets to output XML declaration (in which case XML parser must default to UTF-8 , as per XML specs), or claims it's UTF-8 even when it isn't.当某些进程使用 Latin-1 转储XML时很容易发生这种情况,但要么忘记输出XML声明(在这种情况下, XML解析器必须默认为UTF-8 ,根据XML规范),或者声称它是UTF-8即使它是不是。

即使文件以其他方式编码,解析器也设置为 UTF-8,或者文件被声明为使用 UTF-8 但实际上并没有。

You could try to change default character encoding used by String.getBytes() to utf-8.您可以尝试将 String.getBytes() 使用的默认字符编码更改为 utf-8。 Use VM option -Dfile.encoding=utf-8.使用 VM 选项 -Dfile.encoding=utf-8。

I had the same problem.我有同样的问题。 My problem was that I created a new XML file with jdom and the FileWriter(xmlFile) .我的问题是我用 jdom 和FileWriter(xmlFile)创建了一个新的 XML 文件。 The FileWriter was not able to create a UTF-8 File. FileWriter 无法创建 UTF-8 文件。 Instead using the FileOutputStream(xmlFile) solved it.而是使用FileOutputStream(xmlFile)解决了它。

For those who still get such mistake.对于那些仍然犯这种错误的人。

since UTF-8 is being used check out your xml document for any latin letters or so: I had the same problem and the reason was i had this:由于正在使用 UTF-8,请检查您的 xml 文档中是否有任何拉丁字母左右:我遇到了同样的问题,原因是我有这个:

<n:name>Åke Jógvan Øyvind</n:name>

Hope this helps希望这可以帮助

I had the same problem too when trying import my .xml file into my java tool.尝试将我的 .xml 文件导入我的 java 工具时,我也遇到了同样的问题。 And I found a good solution for this: 1. Open the .xml file with Notepad++ then save the .xml file as .rtf file.我找到了一个很好的解决方案: 1. 用 Notepad++ 打开 .xml 文件,然后将 .xml 文件另存为 .rtf 文件。 Then open this file in WordPad application.然后在写字板应用程序中打开此文件。 2. Save the .rtf file as .txt file, then open it with Notepad, and save it as .xml file again. 2. 将.rtf 文件另存为.txt 文件,然后用记事本打开,再次将其另存为.xml 文件。 When saving in Notepad, near the end of the pop-up window, make sure choosing the option "Encoding: UTF-8".在记事本中保存时,在弹出窗口的末尾附近,确保选择“编码:UTF-8”选项。 It worked for mine, hope it's useful for yours too.它对我有用,希望它对你也有用。

The switching of the encoding for the input might help in this case:在这种情况下,输入编码的切换可能会有所帮助:

XMLEventReader eventReader =
                            inputFactory.createXMLEventReader(in, 
                                    "utf-8"
                                    //"windows-1251"
                            );

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 MalformedByteSequenceException:2字节UTF-8序列的无效字节2 - MalformedByteSequenceException: Invalid byte 2 of 2-byte UTF-8 sequence 2字节UTF-8序列的无效字节2:如何查找字符 - Invalid byte 2 of 2-byte UTF-8 sequence : How to find the character Android studio 2字节UTF-8序列的无效字节2 - Android studio Invalid byte 2 of 2-byte UTF-8 sequence JAXB和UTF-8解组异常“ 2字节UTF-8序列的无效字节2” - JAXB & UTF-8 Unmarshal exception “Invalid byte 2 of 2-byte UTF-8 sequence” 2 字节 UTF-8 Java 的无效字节 2,序列错误取决于 Windows/IntelliJ - Invalid byte 2 of 2-byte UTF-8 Java, sequence error depending on Windows/IntelliJ 从URL解析RSS给我“ 2字节UTF-8序列的无效字节2” - Parse RSS from URLs gives me “Invalid byte 2 of 2-byte UTF-8 sequence” Selenium Web驱动程序:MalformedByteSequenceException 2字节UTF-8序列的无效字节2 - Selenium Web Driver : MalformedByteSequenceException Invalid byte 2 of 2-byte UTF-8 sequence 嵌套的异常是org.xml.sax.SAXParseException:2字节UTF-8序列的无效字节2 - nested exception is org.xml.sax.SAXParseException: Invalid byte 2 of 2-byte UTF-8 sequence 2字节UTF-8序列的无效字节2:XML保存为字符串变量 - Invalid byte 2 of 2-byte UTF-8 sequence: XML saved as String varible 在Windows中使用Java读取UTF-8格式的xml -file会给出“ IOException:2字节UTF-8序列的无效字节2。” -error - Reading xml -file in UTF-8 format in Windows with Java gives “IOException: Invalid byte 2 of 2-byte UTF-8 sequence.” -error
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM