DocumentBuilder.parse似乎随机地从HTTP请求中跳过了返回的XML InputStream的开头

Question

I have the following code to send a HTTP request, receive the response (which is in the form of an XML) and parse it: 我有以下代码来发送HTTP请求，接收响应（以XML的形式）并进行解析：

public Document getDocumentElementFromDatabase() {
    // this URL is actually built dynamically from a query, but for this example I just use one of the possible resulting URLs
    String url = "http://musicbrainz.org/ws/2/recording?query=%22Thunderstruck%22+AND+artistname%3A%222Cellos%22";

    try {
        // sleep between successive requests to avoid flooding the server
        Thread.sleep(1000);
        HttpURLConnection connection = runQuery(url);
        InputStream stream = connection.getInputStream();
        if (stream != null) {
            BufferedInputStream buff = new BufferedInputStream(stream);
            return DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(buff);
        }
    }

    // I've grouped exception handling for this example
    catch (ParserConfigurationException | InterruptedException | SAXException | IOException e) {
        e.printStackTrace();
    }

    finally {
        if (connection != null) connection.disconnect();
    }

    return null;
}

private void runQuery(String url) throws MalformedURLException, IOException {
    HttpURLConnection connection = (HttpURLConnection) new URL(url).openConnection();
    connection.setRequestProperty("User-Agent", "MyAppName/1.0 ( myemail@email.email )");
    return connection;
}

This code gets called multiple times and sometimes I get the following error: 多次调用此代码，有时会出现以下错误：

[Fatal Error] :1:1: Content is not allowed in prolog. [致命错误]：1：1：序言中不允许内容。

org.xml.sax.SAXParseException; org.xml.sax.SAXParseException; lineNumber: 1; lineNumber：1； columnNumber: 1; columnNumber：1； Content is not allowed in prolog. 序言中不能有内容。

at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source) 在com.sun.org.apache.xerces.internal.parsers.DOMParser.parse（未知来源）

at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown Source) 在com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse（未知来源）

at javax.xml.parsers.DocumentBuilder.parse(Unknown Source) 在javax.xml.parsers.DocumentBuilder.parse（未知来源）

... ...

If I try to access the URL in say Chrome, I get a valid XML response every time, no matter how many times I reload. 如果我尝试使用Chrome浏览器访问URL，则无论我重新加载多少次，每次都会收到有效的XML响应。 What's more, this same issue does not seem to appear when I run the exact same code on my laptop. 此外，当我在笔记本电脑上运行完全相同的代码时，似乎没有出现相同的问题。

After a bit of tinkering, I tried printing the InputStream s directly as strings (using method 4 from this link ), rather than parsing them, and I noticed that sometimes the response in fact did not have the expected XML header ( <?xml version="1.0" encoding="UTF-8" standalone="yes"?> ), but other times it did. 稍作修改后，我尝试将InputStream s直接打印为字符串（使用此链接中的方法4），而不是解析它们，并且我注意到有时响应实际上没有预期的XML标头（ <?xml version="1.0" encoding="UTF-8" standalone="yes"?> ），但其他时候确实如此。

My guess is I'm doing something wrong with the streams, but I can't figure out what. 我的猜测是我在流中做错了，但我不知道该怎么办。

Answer 1

I have found the problem. 我发现了问题。 The site seemed to sometimes return a JSON response instead of an XML, which caused the parser to freak out. 该站点似乎有时返回JSON响应而不是XML，这导致解析器异常。 I've added the following line to runQuery : 我在runQuery添加了以下行：

connection.setRequestProperty("Accept", "application/xml");

and I can now successfully run the code without errors. 现在，我可以成功运行代码而不会出现错误。

DocumentBuilder.parse似乎随机地从HTTP请求中跳过了返回的XML InputStream的开头

问题描述

1 个解决方案

解决方案1
0 已采纳 2018-01-31 21:00:51

DocumentBuilder.parse似乎随机地从HTTP请求中跳过了返回的XML InputStream的开头

问题描述

1 个解决方案

解决方案1 0 已采纳 2018-01-31 21:00:51

解决方案1
0 已采纳 2018-01-31 21:00:51