简体   繁体   English

如何使用Jsoup从网站获取内容

[英]How to get the content from a website using Jsoup

I amm trying to get the data from a website. 我打算从网站上获取数据。 With this code: 使用此代码:

@WebServlet(description = "get content from teamforge", urlPatterns = { "/JsoupEx" })
public class JsoupEx extends HttpServlet {
    private static final long serialVersionUID = 1L;
    private static final String URL = "http://www.moving.com/real-estate/city-profile/results.asp?Zip=60505";

    public JsoupEx() {
        super();
    }

    protected void doGet(HttpServletRequest request,
            HttpServletResponse response) throws ServletException, IOException {
        Document doc = Jsoup.connect(URL).get();
        for (Element table : doc.select("table.DataTbl")) {
            for (Element row : table.select("tr")) {
                Elements tds = row.select("td");
                if (tds.size() > 1) {
                    System.out.println(tds.get(0).text() + ":"
                            + tds.get(2).text());
                }
            }
        }
    }
}

I am using the jsoup parser. 我正在使用jsoup解析器。 When run, I do not get any errors, just no output. 运行时,我没有任何错误,只是没有输出。

Please help on this. 请帮忙。

With the following code 用下面的代码

public class Tester {
    private static final String URL = "http://www.moving.com/real-estate/city-profile/results.asp?Zip=60505";


    public static void main(String[] args) throws IOException {
        Document doc = Jsoup.connect(URL).get();
        System.out.println(doc);

    }

}

I get a java.net.SocketTimeoutException: Read timed out. 我收到java.net.SocketTimeoutException:读取超时。 I think the particuliar URL you are trying to crawl is too slow for Jsoup. 我认为您要爬网的特定URL对于Jsoup来说太慢了。 Being in Europe, my connection might be slower as yours. 在欧洲,我的联系可能会比您慢。 However you might want to check for this exception in the log of your AS. 但是,您可能要在AS的日志中检查此异常。

By setting the timeout to 10 seconds, I was able to download and parse the document : 通过将超时设置为10秒,我可以下载并解析文档:

Connection connection = Jsoup.connect(URL);
connection.timeout(10000);
Document doc = connection.get();
System.out.println(doc);

With the rest of your code I get : 使用其余的代码,我得到:

Population:78,413 人口:78413

Population Change Since 1990:53.00% 自1990年以来的人口变化:53.00%

Population Density:6,897 人口密度:6,897

Male:41,137 男:41137

Female:37,278 女:37278

..... .....

thanx Julien, I tried with the following code, getting SocketTimeoutException. thanx Julien,我尝试使用以下代码,获取SocketTimeoutException。 And code is 和代码是

Connection connection=Jsoup.connect("http://www.moving.com/real-estate/city-   
profile/results.asp?Zip=60505");
connection.timeout(10000);
Document doc = connection.get();
System.out.println(doc);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM