如何使用Java和Jsoup读取网站的特定行？

Question

First, I want to thank you all for taking your time to help in advance 首先，我要感谢大家抽出宝贵时间预先提供帮助

Next, I want to point out that I already read this answer When I inspect element in google chrome on stackoverflow its really easy to understand but on the webpage listed below its kind of messy 接下来，我想指出的是，我已经阅读了这个答案，当我在stackoverflow上检查google chrome中的元素时，它的确很容易理解，但是在下面列出的网页上却很杂乱

I want to be able to load information from companies listed on this webpage http://www.manta.com/mb_51_ALL_CVZ/carlstadt_nj?pg=1 我希望能够从此网页http://www.manta.com/mb_51_ALL_CVZ/carlstadt_nj?pg=1上列出的公司中加载信息

Finally, this is my code currently 最后，这是我目前的代码

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class Main {
public static void main(String[]args)throws Exception{
    String url = "http://www.manta.com/mb_51_ALL_CVZ/carlstadt_nj?pg=1";
    Document doc = Jsoup.connect(url).get();

    String address = doc.select("").text();
    String telephone = doc.select("").text();
    String description = doc.select("").text();
    // want to retrieve the address, the telephone number and the description of the 
    // company listen on the website that i provided

}
}

Answer 1

First of all, use the User Agent string, so the page you get in your program will be the same one you get with your browser - 首先，使用User Agent字符串，这样您在程序中获得的页面将与您在浏览器中获得的页面相同-

Jsoup.connect(url)
     .userAgent("Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0")
     .get();

The selector for the entire table is ul.list-group:nth-child(4) 整个表的选择器是ul.list-group:nth-child(4)
and the selctor for each row is ul.list-group:nth-child(4) > li:nth-child(X) > div:nth-child(1) where X is a number between 1 and the number of rows. 并且每行的选择者是ul.list-group:nth-child(4) > li:nth-child(X) > div:nth-child(1) ，其中X是介于1和行数之间的数字。
Inside each row you can find easily the selectors for address, phone and so on with your browser. 在每一行内，您都可以使用浏览器轻松找到地址，电话等的选择器。 For example - the address frrom the first row is given by ul.list-group:nth-child(4) > li:nth-child(1) > div:nth-child(1) > div:nth-child(1) > div:nth-child(1) > div:nth-child(3) > div:nth-child(1) > div:nth-child(1) > div:nth-child(2) > span:nth-child(1) . 例如-第一行的地址由ul.list-group:nth-child(4) > li:nth-child(1) > div:nth-child(1) > div:nth-child(1) > div:nth-child(1) > div:nth-child(3) > div:nth-child(1) > div:nth-child(1) > div:nth-child(2) > span:nth-child(1) 。
Just loop thru. 只是循环。 all the rows and extract whatever you need. 所有行并提取您需要的任何内容。

如何使用Java和Jsoup读取网站的特定行？

问题描述

1 个解决方案

解决方案1
1 2016-05-08 09:14:17

如何使用Java和Jsoup读取网站的特定行？

问题描述

1 个解决方案

解决方案1 1 2016-05-08 09:14:17

解决方案1
1 2016-05-08 09:14:17