简体   繁体   English

无法弄清楚如何抓取特定文本-使用Jsoup

[英]Can't figure out how to scrape specific text - Using Jsoup

I just started learning how to use JSoup. 我刚刚开始学习如何使用JSoup。 I think I've successfully selected this section of the html, and I successfully took "DARK SOULS III Deluxe Edition" out by doing .select("span.title").text but I was trying to get the prices, in this case $84.98 and $55.23. 我认为我已经成功选择了html的这一部分,并且通过执行.select(“ span.title”)。text成功删除了“ DARK SOULS III Deluxe Edition”,但是在这种情况下,我试图获取价格。 84.98美元和55.23美元。 I tried doing .select("div.col search_price responsive_secondrow").text but it comes up as blank. 我尝试做.select(“ div.col search_price响应_secondrow”)。text,但它显示为空白。 I was wondering if someone could help me figure out how to extract that part, thanks in advance! 我想知道是否有人可以帮助我弄清楚如何提取该部分,在此先感谢! Here's the html of the section of the page. 这是页面部分的html。

The full html is view-source: http://store.steampowered.com/search/?filter=topsellers 完整的html是源代码: http : //store.steampowered.com/search/?filter=topsellers

<a href="http://store.steampowered.com/sub/94174/?snr=1_7_7_topsellers_150_1"  data-ds-packageid="94174" data-ds-appid="374320,442010"onmouseover="GameHover( this, event, 'global_hover', {&quot;type&quot;:&quot;sub&quot;,&quot;id&quot;:94174,&quot;public&quot;:1,&quot;v6&quot;:1} );" onmouseout="HideGameHover( this, event, 'global_hover' )" class="search_result_row ds_collapse_flag" >
                <div class="col search_capsule"><img src="http://cdn.edgecast.steamstatic.com/steam/subs/94174/capsule_sm_120.jpg?t=1476893662"></div>
                <div class="responsive_search_name_combined">
                    <div class="col search_name ellipsis">
                        <span class="title">DARK SOULS III Deluxe Edition</span>
                        <p>
                            <span class="platform_img win"></span>                          </p>
                    </div>
                    <div class="col search_released responsive_secondrow">12 Apr, 2016</div>
                    <div class="col search_reviewscore responsive_secondrow">
                                                        <span class="search_review_summary positive" data-store-tooltip="Very Positive&lt;br&gt;86% of the 29,204 user reviews for games in this bundle are positive.">
                            </span>
                                                </div>


                    <div class="col search_price_discount_combined responsive_secondrow">
                        <div class="col search_discount responsive_secondrow">
                            <span>-35%</span>
                        </div>
                        <div class="col search_price discounted responsive_secondrow">
                            <span style="color: #888888;"><strike>$84.98</strike></span><br>$55.23                          </div>
                    </div>
                </div>


                <div style="clear: left;"></div>
            </a>

Use doc.select("a.search_result_row") instead: 使用doc.select(“ a.search_result_row”)代替:

public class JsoupSteamTest {

    public static void main(String[] args) throws IOException {

        Document doc = Jsoup.connect("http://store.steampowered.com/search/?filter=topsellers").userAgent("Mozilla")
                .get();

        Elements table = doc.select("a.search_result_row");

        Iterator<Element> ite = table.iterator();
        while (ite.hasNext()) {
            Element element = ite.next();
            System.out.println(element.text());

        }
    }
}

You will get a list like this: 您将获得如下列表:

PLAYERUNKNOWN'S BATTLEGROUNDS 23 Mar, 2017 29,99€
Steel Division: Normandy 44 Coming Soon 39,99€
DARK SOULS™ III 11 Apr, 2016 -50% 59,99€ 29,99€

Your particular problem comes from the div that has multiple classes. 您的特定问题来自具有多个类的div。

To select an element that has multiple classes, use a dot instead of a space in your select: 要选择具有多个类的元素,请在选择的内容中使用点而不是空格:

doc.select("div.col.search_price.discounted.responsive_secondrow");

Take a look at this question: JSOUP get element with multiple classes 看一下这个问题: JSOUP get具有多个类的元素

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM