简体   繁体   中英

Scraping an image source nested within several elements in JSoup

I've been using JSoup for the last couple of days to try and scrape some data off of Amazon for my Android app project, and I've seen all of the tutorials on the JSoup website, and many of the questions here on Stack Overflow itself. However, despite everything I've tried and all of the hours I've spent try to get this to extract the source attribute from the img element, nothing seems to be working.

HTML code from the website is listed here, what I want to extract is the source attribute from the img element that has the class name "a-dynamic-image a-stretch-horizontal":

<ul class="a-unordered-list a-nostyle a-horizontal list maintain-height">
<li class="image item itemNo0 selected maintain-height"><span class="a-list-item">
    <span class="a-declarative" data-action="main-image-click" data-main-image-click="{}">
        <div id="imgTagWrapperId" class="imgTagWrapper">
            <img alt="MSI R9 390 GAMING 8G Graphics Card" src="(source URL that I want to extract)"

" data-old-hires="https://images-na.ssl-images-amazon.com/images/I/81RZgkZUJlL._SL1500_.jpg"  class="a-dynamic-image  a-stretch-horizontal" id="landingImage" data-a-dynamic-image="{&quot;https://images-na.ssl-images-amazon.com/images/I/81RZgkZUJlL._SX450_.jpg&quot;:[338,450],&quot;https://images-na.ssl-images-amazon.com/images/I/81RZgkZUJlL._SX425_.jpg&quot;:[319,425],&quot;https://images-na.ssl-images-amazon.com/images/I/81RZgkZUJlL._SX466_.jpg&quot;:[350,466],&quot;https://images-na.ssl-images-amazon.com/images/I/81RZgkZUJlL._SX355_.jpg&quot;:[266,355],&quot;https://images-na.ssl-images-amazon.com/images/I/81RZgkZUJlL._SX522_.jpg&quot;:[392,522]}" style="max-width:522px;max-height:392px;">
            </div>
        </span>
    </span></li>

My code in Android Studio is as follows:

Document doc = Jsoup.connect(url).get();
Element link= doc.select("ul.a-unordered-list a-nostyle a-horizontal list maintain-height").select("span.a-list-item span.a-declarative").select("span.a-declarative")
                   .select("div.imgTagWrapper").select("img.a-dynamic-image  a-stretch-horizontal").first();
String imageSRC = link.attr("src");

I would love to know exactly what I'm missing here since admittedly I am still very inexperienced with Java and especially JSoup. Any help would be greatly appreciated, thanks!

Try this.

Element link= doc.select("ul.a-unordered-list.a-nostyle.a-horizontal.list.maintain-height")
    .select("span.a-list-item span.a-declarative")
    .select("span.a-declarative")
    .select("div.imgTagWrapper")
    .select("img.a-dynamic-image.a-stretch-horizontal").first();
String imageSRC = link.attr("src");

You should select multiple classes by

.select("TAG.CLASS1.CLASS2.CLASS3")

instead of

.select("TAG.CLASS1 CLASS2 CLASS3")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM