简体   繁体   中英

how to deal with “Unexpected end of file from server”?

I want to use Jsoup to crawl content of from http://ws.audioscrobbler.com/2.0/?method=track.getInfo&api_key=550633c179112c8002bc6a0942d55b2a&artist=lucinda%20williams&track=lake%20charles

The codes are :

    Document doc = Jsoup.connect("http://ws.audioscrobbler.com    /2.0/?method=track.getInfo&api_key=550633c179112c8002bc6a0942d55b2a&artist=lucinda williams&track=lake charles")
                        .userAgent("Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:20.0) Gecko/20100101 Firefox/20.0")
                        .timeout(5000)
                        .get();

However, something wrong happens:

    Exception in thread "main" java.net.SocketException: Unexpected end of file from server
            at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:770)
            at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
            at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:767)
            at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
            at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1162)
            at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:397)
            at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:429)
            at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:410)
            at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:164)
            at org.jsoup.helper.HttpConnection.get(HttpConnection.java:153)
            at JsoupXML.main(JsoupXML.java:16)

But, when I use brwoser to visit the url, eveything is OK. Besides, when I use above codes to crawl content of http://ws.audioscrobbler.com/2.0/?method=track.getInfo&api_key=550633c179112c8002bc6a0942d55b2a&artist=cher&track=believe , everything is OK too.

Could you know the reason and any good ideas to solve it?

Thanks for your attention and sorry about my english.

Thanks for NeplatnyUdaj's kindly help, you give me wonderful hint. I forgot to replace whitspace and other special symbols with %20,%26 and so on.

Well. The exception means that the remote server closed the connection unexpectedly.

The answer belows assumes that all those spaces visible in the question code URL are not actually there in your code.

There is really nothing much you can do except catch the exception and try again (or report an error to the user).

As for why the server closed the connection:

  • It did not like your request (retrying it will not help here), check the documentation for audioscrobbler
    • is the host header there and correct (in your example it would be incorrect, since you have spaces in there)?
    • Do you have to include other headers to make a valid request?
    • Is that API key correct?
  • The server might have issues currently (causing it to drop requests, this is where a retry might help)
  • It thinks you are making too many questions to it, and some anti-spam protection has been engaged (this is where a retry would hurt).

On a related note: Including the API-key in the question might not be optimal.

Change the user agent (or at least define it).

More details: Scraping a site

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM