![](/img/trans.png)
[英]Jsoup.parse() vs. Jsoup.parse() - or How does URL detection work in Jsoup?
[英]Jsoup.parse Mobile url
我是Jsoup的新手,我正嘗試使用Jsoup.parse()下載移動網站。下面的代碼對普通URL正常,但不適用於移動電話,這是怎么回事?
private static Document downloadDocument(String url, String referer, int timeout) {
if (url.isEmpty() || url == null) {
return null;
}
if (referer.isEmpty() || referer == null) {
//default to google.
referer = "http://www.google.com";
}
Document document;
try {
document = Jsoup.parse(new URL(url), timeout);
} catch (IOException e) {
//TODO - Remove System.out.println - Memory Issue.
System.out.println("Sorry, unable to download document");
return null;
}
return document;
}
堆棧跟蹤如下:
org.jsoup.HttpStatusException: HTTP error fetching URL. Status=403, URL=http://m.careerbuilder.com/
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:449)
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:424)
at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:178)
at org.jsoup.helper.HttpConnection.get(HttpConnection.java:167)
at org.jsoup.Jsoup.parse(Jsoup.java:183)
...
您要解析的網站檢查用戶代理,並且不接受默認代理(即Java / jdk_version )。 因此,您應該使用“偽”用戶代理,如下所示:
Document html = Jsoup.connect("http://m.careerbuilder.com").userAgent("Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1667.0 Safari/537.36").get();
System.out.println(html);
其中Mozilla / 5.0(Windows NT 6.2; Win64; x64)AppleWebKit / 537.36(KHTML,例如Gecko)Chrome / 32.0.1667.0 Safari / 537.36是Chrome 32.0.1667.0的用戶代理
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.