简体   繁体   中英

parse html from a web page which uses infinite scroll

I would like to parse html from web page which use infinite scroll, such as: pinterest.com so as to get all items.

public List<String> popularTagsPinterest(String tag) throws Exception {

    List<String> results = new ArrayList<>();
    try {
        Document doc = Jsoup.connect(
                urlPinterest + tag + "&eq=%23" + tag + "&etslf=6622&term_meta[]=%23" + tag + "%7Cautocomplete%7C0")
                .timeout(90000).get();
        Elements img1 = doc.select("a.pinImageWrapper img.pinImg");
        for (Element e : img1) {
            results.add(e.attr("src"));
            System.out.println(e.attr("src"));
        }
    } catch (Exception e) {
        e.printStackTrace();

    }
    return results;
}

Get base url and the ajax call for loading another part can do.

Check this page, is a good example.

https://blog.scrapinghub.com/2016/06/22/scrapy-tips-from-the-pros-june-2016

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM