简体   繁体   English

Jsoup.parse移动网址

[英]Jsoup.parse Mobile url

I am new to Jsoup and I am trying to download a mobile website using Jsoup.parse().Code below works fine for a normal URL but not for mobiles.What's wrong with it ? 我是Jsoup的新手,我正尝试使用Jsoup.parse()下载移动网站。下面的代码对普通URL正常,但不适用于移动电话,这是怎么回事?

Code: 码:

 private static Document downloadDocument(String url, String referer, int timeout) {
    if (url.isEmpty() || url == null) {
        return null;
    }
    if (referer.isEmpty() || referer == null) {
        //default to google.
        referer = "http://www.google.com";
    }
    Document document;
    try {
        document = Jsoup.parse(new URL(url), timeout);
    } catch (IOException e) {
        //TODO - Remove System.out.println - Memory Issue.
        System.out.println("Sorry, unable to download document");
        return null;
    }
    return document;
}

Stack trace is as follows: 堆栈跟踪如下:

org.jsoup.HttpStatusException: HTTP error fetching URL. Status=403, URL=http://m.careerbuilder.com/
    at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:449)
    at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:424)
    at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:178)
    at org.jsoup.helper.HttpConnection.get(HttpConnection.java:167)
    at org.jsoup.Jsoup.parse(Jsoup.java:183)
    ...

The website you want to parse checks the user agent and does not accept the default one (which is Java/jdk_version ). 您要解析的网站检查用户代理,并且不接受默认代理(即Java / jdk_version )。 So you should use a "fake" user agent, like this: 因此,您应该使用“伪”用户代理,如下所示:

Document html = Jsoup.connect("http://m.careerbuilder.com").userAgent("Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1667.0 Safari/537.36").get();
System.out.println(html);

Where Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1667.0 Safari/537.36 is the user agent of Chrome 32.0.1667.0 其中Mozilla / 5.0(Windows NT 6.2; Win64; x64)AppleWebKit / 537.36(KHTML,例如Gecko)Chrome / 32.0.1667.0 Safari / 537.36是Chrome 32.0.1667.0的用户代理

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM