Scraping HTML with JSoup, getting HTTP error, status 456

Question

I am trying to scrape a site (www.oddsportal.com) with JSoup, but i have run into an issue.

String url = "http://www.oddsportal.com/matches/";      
Document doc = null;
System.out.println("Connecting to " + url + "...");
try {
    doc = Jsoup.connect(url).get();
} catch (IOException e1) {
    e1.printStackTrace();
}

When i connect and do a "get" i get the following:

 Connecting to http://www.oddsportal.com/matches/...

       org.jsoup.HttpStatusException: HTTP error fetching URL. Status=456, 
       URL=http://www.oddsportal.com/matches/
            at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:435)
            at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:410)
            at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:164)
            at org.jsoup.helper.HttpConnection.get(HttpConnection.java:153)

What could be the cause? It seems there is no HTTP 456 status code, so i assume it's some sort of site-specific code? There is a login function at the site but it is not mandatory for viewing the content. Other sites i have tried works just fine.

Answer 1

如果包括user agent ，它将通过文档提供帮助：

Document doc = Jsoup.connect("http://example.com").userAgent("Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:25.0) Gecko/20100101 Firefox/25.0").get();

Scraping HTML with JSoup, getting HTTP error, status 456

Question

1 answers

solution1
2 2013-10-02 00:26:54

Scraping HTML with JSoup, getting HTTP error, status 456

Question

1 answers

solution1 2 2013-10-02 00:26:54

solution1
2 2013-10-02 00:26:54