简体   繁体   English

如何使用Java和Jsoup读取网站的特定行?

[英]How can I read specific lines of a website using Java and Jsoup?

First, I want to thank you all for taking your time to help in advance 首先,我要感谢大家抽出宝贵时间预先提供帮助

Next, I want to point out that I already read this answer When I inspect element in google chrome on stackoverflow its really easy to understand but on the webpage listed below its kind of messy 接下来,我想指出的是,我已经阅读了这个答案,当我在stackoverflow上检查google chrome中的元素时,它的确很容易理解,但是在下面列出的网页上却很杂乱

I want to be able to load information from companies listed on this webpage http://www.manta.com/mb_51_ALL_CVZ/carlstadt_nj?pg=1 我希望能够从此网页http://www.manta.com/mb_51_ALL_CVZ/carlstadt_nj?pg=1上列出的公司中加载信息

Finally, this is my code currently 最后,这是我目前的代码

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class Main {
public static void main(String[]args)throws Exception{
    String url = "http://www.manta.com/mb_51_ALL_CVZ/carlstadt_nj?pg=1";
    Document doc = Jsoup.connect(url).get();

    String address = doc.select("").text();
    String telephone = doc.select("").text();
    String description = doc.select("").text();
    // want to retrieve the address, the telephone number and the description of the 
    // company listen on the website that i provided

}
}

First of all, use the User Agent string, so the page you get in your program will be the same one you get with your browser - 首先,使用User Agent字符串,这样您在程序中获得的页面将与您在浏览器中获得的页面相同-

Jsoup.connect(url)
     .userAgent("Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0")
     .get();

The selector for the entire table is ul.list-group:nth-child(4) 整个表的选择器是ul.list-group:nth-child(4)
and the selctor for each row is ul.list-group:nth-child(4) > li:nth-child(X) > div:nth-child(1) where X is a number between 1 and the number of rows. 并且每行的选择者是ul.list-group:nth-child(4) > li:nth-child(X) > div:nth-child(1) ,其中X是介于1和行数之间的数字。
Inside each row you can find easily the selectors for address, phone and so on with your browser. 在每一行内,您都可以使用浏览器轻松找到地址,电话等的选择器。 For example - the address frrom the first row is given by ul.list-group:nth-child(4) > li:nth-child(1) > div:nth-child(1) > div:nth-child(1) > div:nth-child(1) > div:nth-child(3) > div:nth-child(1) > div:nth-child(1) > div:nth-child(2) > span:nth-child(1) . 例如-第一行的地址由ul.list-group:nth-child(4) > li:nth-child(1) > div:nth-child(1) > div:nth-child(1) > div:nth-child(1) > div:nth-child(3) > div:nth-child(1) > div:nth-child(1) > div:nth-child(2) > span:nth-child(1)
Just loop thru. 只是循环。 all the rows and extract whatever you need. 所有行并提取您需要的任何内容。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM