繁体   English   中英

使用Jsoup连接到网页时出现的问题

[英]Issues using Jsoup to connect to a webpage

这是我第一次使用JSoup,并且在连接到要从中解析信息的URL时遇到问题。

网址: http//uselectionatlas.org/RESULTS/national.php?f = 1&year = 2008&off = 0& elect = 0

我最初尝试这样做,但是却遇到超时异常

    Document doc = Jsoup.connect("http://uselectionatlas.org/RESULTS/national.php?f=1&year=2008&off=0&elect=0").get();

这是例外:

java.net.SocketTimeoutException: Read timed out
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.read(SocketInputStream.java:152)
    at java.net.SocketInputStream.read(SocketInputStream.java:122)
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
    at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
    at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
    at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1324)
    at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468)
    at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:575)
    at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:548)
    at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:235)
    at org.jsoup.helper.HttpConnection.get(HttpConnection.java:224)
    at ParseData.main(ParseData.java:18)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)

我在网上进行了一些研究,然后发现了.timeout(0)方法,该方法将Jsoup超时设置为无限。

现在当我尝试这个

            Document doc = Jsoup.connect("http://uselectionatlas.org/RESULTS/national.php?f=1&year=2008&off=0&elect=0").timeout(0).get();

我得到以下异常:

org.jsoup.HttpStatusException: HTTP error fetching URL. Status=403, URL=http://uselectionatlas.org/RESULTS/national.php?f=1&year=2008&off=0&elect=0
    at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:598)
    at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:548)
    at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:235)
    at org.jsoup.helper.HttpConnection.get(HttpConnection.java:224)
    at ParseData.main(ParseData.java:18)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)

有人可以指出正确的方向,指示我应如何将此URL加载到jsoup中?

403错误表示服务器禁止访问。 您只需要将UserAgent属性添加到HTTP标头中,如下所示:

Jsoup.connect("http://uselectionatlas.org/RESULTS/national.php?f=1&year=2008&off=0&elect=0")
.userAgent("Mozilla/5.0")
.timeout(0).get();

有些站点不允许使用机器人,这就是该站点正在发生的情况。 您必须添加一个用户代理,以使其不受限制。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM