MalformedByteSequenceException 1 字節 UTF-8 序列的字節 1 無效

Question

我正在編寫一個XML解析器類，當我運行它時，有時它工作正常，但有時它不起作用並拋出此異常：

MalformedByteSequenceException 1 字節 UTF-8 序列的字節 1 無效

誰能提供一些關於為什么的信息？

這是我的代碼：

package TRT;



import java.math.BigInteger;
import java.net.URL;
import java.net.URLConnection;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;

import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;

public class Gundem {


    public static void main(String[] args) {
        // TODO Auto-generated method stub

        Gundem gundem=new Gundem();
        try {
            URL url=new URL("http://www.trt.net.tr/rss/gundem.rss");
            URLConnection connection=url.openConnection();

            DocumentBuilderFactory builderFactory=DocumentBuilderFactory.newInstance();
            DocumentBuilder docBuilder=builderFactory.newDocumentBuilder();
            Document document=docBuilder.parse(connection.getInputStream());

            Element element=document.getDocumentElement();

            Node node=(Node)element.getChildNodes();
            System.out.println(node.getNodeName());


            NodeList nodeList=node.getChildNodes();
            Node channelNode=(Node)nodeList.item(0);
            System.out.println(channelNode.getNodeName());

            NodeList childNodeListOfChannelNode=channelNode.getChildNodes();

            for(int i=0;i<childNodeListOfChannelNode.getLength();i++){
                Node childNodesOfChannelNode=(Node)childNodeListOfChannelNode.item(i);
                System.out.println(childNodesOfChannelNode.getNodeName());

                if(childNodesOfChannelNode.getNodeName().equals(Constants.ITEM)){
                    Item item=new Item();
                    NodeList itemList=childNodesOfChannelNode.getChildNodes();
                    for(int j=0;j<itemList.getLength();j++){
                        Node childNodeOfItem=itemList.item(j);
                        if(childNodeOfItem.getNodeName().equals(Constants.TITLE)){
                            item.setTitle(childNodeOfItem.getTextContent());
                            System.out.println(item.getTitle());
                            System.out.println(gundem.dumpingInputAsHex(item.getTitle()));
                        }
                        else       if(childNodeOfItem.getNodeName().equals(Constants.DESCRIPTION)){
                            item.setDescription(childNodeOfItem.getTextContent());
                            System.out.println(item.getDescription());
                        }

                    }

                }

            }

        } catch (Exception e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }



        System.exit(0);  // this line is for solving that problem; JDWP Unable to get JNI 1.2 environment, jvm->GetEnv() return code = -2


        }

    public String dumpingInputAsHex(String input){
        return String.format("%40x",new BigInteger(1,input.getBytes()));
    }

}

Answer 1

最可能的情況是，您正試圖以 UTF-8 格式解析使用其他字符集（例如 ISO-8859-1）編碼的文檔。 解析器遇到了一個 ISO-8859-1 字符，其值在 UTF-8 中自身不允許出現。

要解決此問題，您需要確定文檔的實際編碼，然后根據connection.getInputStream()的返回值創建您自己的InputStreamReader ，並指定正確的編碼。 然后從閱讀器創建一個InputSource並將其傳遞給docBuilder.parse() 。

進一步的研究：

我在 Eclipse (JDK 7) 中運行了您的代碼，並且能夠重現該錯誤。 然后，我在 Eclipse 中為兩個MalformedByteSequenceException異常設置了異常斷點，它沒有失敗。 跟蹤代碼，我只能看到一次輸入緩沖區中的無效字符。 這向我表明 Xerces 解析器中某處存在競爭條件錯誤。

您可能必須向 Oracle 提交錯誤。

MalformedByteSequenceException 1 字節 UTF-8 序列的字節 1 無效

問題描述

1 個解決方案

解決方案1
0 2014-07-28 03:27:19

MalformedByteSequenceException 1 字節 UTF-8 序列的字節 1 無效

問題描述

1 個解決方案

解決方案1 0 2014-07-28 03:27:19

解決方案1
0 2014-07-28 03:27:19