简体   繁体   English

使用Jsoup解析div中的span

[英]Parsing span inside div using Jsoup

Given this HTML: 鉴于此HTML:

 <div id="cat-product-list" alt1="356623" class="product-list list_all_items_price price_new"><span id="wholesale_11_member_price" class="index-price special_price final_price" price="US$5.25"><strong class="final_price_strong">US$5.25</strong><b class="show_vip">(vip)</b></span><span id="wholesale_12_member_price" class="index-price special_price final_price" price="US$4.90" style="display: none"><strong class="final_price_strong">US$4.90</strong><b class="show_vip">(vip)</b></span><span id="wholesale_13_member_price" class="index-price special_price final_price" price="US$4.55" style="display: none"><strong class="final_price_strong">US$4.55</strong><b class="show_vip">(vip)</b></span><span id="wholesale_14_member_price" class="index-price special_price final_price" price="US$4.20" style="display: none"><strong class="final_price_strong">US$4.20</strong><b class="show_vip">(vip)</b></span><span id="shop_price_member_price_on" class="index-price shop_price" price="US$7.00"><strike>US$7.00</strike></span></div> 

I am trying to select the first span inside the div and then get the strong value. 我正在尝试选择div内的第一个范围,然后获得强大的价值。 So far I managed to scrape other things successfully, however for this I couldn't get it done: 到目前为止,我设法成功地刮除了其他内容,但是为此,我无法完成它:

Document d = Jsoup.connect("http://www.emmacloth.com/Clothing-vc-7061.html?icn=clothing&ici=ec_navbar05").timeout(6000).get();
    Elements elements =  d.select("div#productsContent1_goods.products_category");
    for (Element element: elements.select("div.box-product-list.list_all_items")){
        System.out.println("start");
        String productImage = element.select("div.goods_aImg a img").attr("src");
        String productname = element.select("div.goods_mz a").attr("title");
        String productUrl = "http://www.emmacloth.com" + element.select("div.goods_mz a").attr("href");
 //         String productPrice = element.select("div.product-
list.list_all_items_price.price_new >span.index-price.special_price.final_price").toString();
        Elements priceElements = element.select(
                "div.product-list.list_all_items_price.price_new > span.index-price.special_price.final_price"
        );

        for (Element priceElement : priceElements) {
            System.out.println(priceElement.attr("price"));
        }
//          System.out.println(productPrice);



    }
}

Within this div you are looking for the span which has the following classes: index-price special_price final_price and from that (I think ) you want to extract the price . 在此div您正在寻找具有以下类别的spanindex-price special_price final_price然后从中(我认为 )您要提取price

Given the html provided in your question, the following code ... 给定您问题中提供的html,以下代码...

String html = "<div id=\"cat-product-list\" alt1=\"356623\" class=\"product-list list_all_items_price price_new\">" +
    "<span id=\"wholesale_11_member_price\" class=\"index-price special_price final_price\" price=\"US$5.25\">" +
    "<strong class=\"final_price_strong\">US$5.25</strong>" +
    "<b class=\"show_vip\">(vip)</b>" +
    "</span>" +
    "<span id=\"wholesale_12_member_price\" class=\"index-price special_price final_price\" price=\"US$4.90\" style=\"display: none\">" +
    "<strong class=\"final_price_strong\">US$4.90</strong>" +
    "<b class=\"show_vip\">(vip)</b>" +
    "</span>" +
    "<span id=\"wholesale_13_member_price\" class=\"index-price special_price final_price\" price=\"US$4.55\" style=\"display: none\">" +
    "<strong class=\"final_price_strong\">US$4.55</strong>" +
    "<b class=\"show_vip\">(vip)</b>" +
    "</span>" +
    "<span id=\"wholesale_14_member_price\" class=\"index-price special_price final_price\" price=\"US$4.20\" style=\"display: none\">" +
    "<strong class=\"final_price_strong\">US$4.20</strong>" +
    "<b class=\"show_vip\">(vip)</b>" +
    "</span>" +
    "<span id=\"shop_price_member_price_on\" class=\"index-price shop_price\" price=\"US$7.00\"><strike>US$7.00</strike></span>" +
    "</div>";

Document doc = Jsoup.parse(html);

// this selector selects the div(s) having classes: "product-list list_all_items_price price_new"
// and within that div, it selects the span(s) having the classes: "index-price special_price final_price"
Elements priceElements = doc.select(
        "div.product-list.list_all_items_price.price_new > span.index-price.special_price.final_price"
);

for (Element priceElement : priceElements) {
    System.out.println(priceElement.attr("price"));
}

... will print out the product prices: ...将打印出产品价格:

US$5.25
US$4.90
US$4.55
US$4.20

Update 更新资料

In response to his comment: 针对他的评论:

or some reason its not working for the whole website, can you check my modified question 或某些原因无法在整个网站上正常播放,您可以检查我的修改问题

The following code ... 以下代码...

Document d =
        Jsoup.connect("http://www.emmacloth.com/Clothing-vc-7061.html?icn=clothing&ici=ec_navbar05").timeout(6000).get();
for (Element element : d.select("div#productsContent1_goods.products_category > div.box-product-list.list_all_items")) {
    System.out.println("start");
    String productImage = element.select("div.goods_aImg > a > img").attr("src");
    String productname = element.select("div.goods_mz > a").attr("title");
    String productUrl = "http://www.emmacloth.com" + element.select("div.goods_mz > a").attr("href");

    System.out.println(productImage);
    System.out.println(productname);
    System.out.println(productUrl);
}

.. will print: ..将打印:

http://img.ltwebstatic.com/images/pi/201710/3b/15090086488079557831_thumbnail_220x293.jpg
Pearl Embellished Bow Tied Bell Cuff Blouse
http://www.emmacloth.com/Pearl-Embellished-Bow-Tied-Bell-Cuff-Blouse-p-403325-cat-1733.html
... etc

So far, so good. 到现在为止还挺好。 But what about the price ? 但是price呢? If you look at the source of this webpage you'll see that the price element is dynamic content which is provided by the category_price JS function on that page. 如果您查看此网页的源代码,您会发现price元素是动态内容,由该页面上的category_price JS函数提供。 So, that element does not exist statically and hence cannot be read by JSoup. 因此,该元素不是静态存在的,因此不能被JSoup读取。 In order to read dynamic content you'll have to use a web driver such as Selenium . 为了读取动态内容,您必须使用Web驱动程序,例如Selenium

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM