簡體   English   中英

如何使用jsoup順序提取數據

[英]How to extract data in sequence using jsoup

我正在嘗試使用jsoup從此鏈接https://orderup.com/some/phoenix/delivery/featured獲取數據,但是我遇到了一些問題,即我的結果數據格式不正確,並且具有描述的類別也未顯示。 這是我的代碼:

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class grabber {
     public static void main(String[] args) throws Exception {
            String url = "https://orderup.com/restaurants/bella-pizza-r3834/delivery";
            Document document = Jsoup.connect(url).get();
            Elements restname = document.select("h1.urbana");
            System.out.println("restname: " + restname.text());
            Elements restaddressdiv = document.select("address.desktop-address");
            Elements restauranthours = document.select("div.restaurant-hours-region");
            Elements restauranthoursa = restauranthours.select("div.restaurant-hours-region");
            Elements restauranthoursregion = restauranthoursa.select("dt");
            System.out.println("restauranthosssurs: " + restauranthoursregion.size());
            for (Element resthours : restauranthoursregion) {
                System.out.println("restauranthours: " + resthours.text());
            }
            Elements h3 = document.select("div.menu-category");
            Elements h3tag = h3.select("h3");
            for(Element e : h3tag)
            {
                 System.out.println("Category: " + e.text());  

                 if (e.nextElementSibling().select("p").size() == 1) {
                     Elements itemtitlep =e.nextElementSibling().select("p");
                     Elements itemtitle = e.nextElementSibling().select("span.item-title");
                     System.out.println(itemtitle.size());
                        int itemtitleCount = itemtitle.size();
                        System.out.println("ifffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff statement");
                        for(Element itema : itemtitle)
                        {
                            System.out.println("Items: " + itema.text());
                            Elements itemtitleprice = itema.nextElementSibling().select(".item-price");
                            Elements itemtitledes = itema.getElementsByTag("p");
                            for(Element itempricea : itemtitleprice)
                            {
                                System.out.println("price: " + itempricea.text());
                            }
                            for(Element itemdesc : itemtitledes)
                            {
                                System.out.println("itemdesc: " + itemdesc.text());
                            }
                        }
                } else {
                    Elements itemtitle = e.nextElementSibling().select("span.item-title");
                    int itemtitleCount = itemtitle.size();
                    System.out.println(itemtitleCount);
                    System.out.println("elssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss statement");
                    for(Element itema : itemtitle)
                    {
                        System.out.println("Items: " + itema.text());
                        Elements itemtitleprice = itema.nextElementSibling().select(".item-price");
                        Elements itemtitledes = itema.getElementsByTag("p");
                        for(Element itempricea : itemtitleprice)
                        {
                            System.out.println("price: " + itempricea.text());
                        }
                        for(Element itemdesc : itemtitledes)
                        {
                            System.out.println("itemdesc: " + itemdesc.text());

                        }
                    }
                }
            }
        }
}

問題是,您訪問的頁面的html比您預期的要靈活一些。 例如,在某些類別中,您有一個主類別的副文本。 這被組織為h3標簽的下一個兄弟。 一種更健壯,也更易於閱讀的方法可能是這樣的:

Elements elh3s = document.select("div.menu-category h3");
for (Element elh3 : elh3s){
    System.out.println("Category: " + elh3.text());

    //get the list by stepping up and then css select the ul
    Elements ellis = elh3.parent().select("ul>li");
    for (Element elli : ellis){
        System.out.println("title: " 
            + elli.select("span.item-title").first().text());
        System.out.println("price: " 
            + elli.select("span.item-price").first().text());
        System.out.println("--");
    }
}

建議:

查看Jsoup CSS選擇器 它們非常強大,並且由於您已經用JSoup解析了頁面,因此可以在幾乎沒有性能問題的情況下充分使用它們。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM