简体   繁体   中英

Selecting href of link with image inside using xpath

I'm using scrapy to write a scraper that finds links with images inside them and grabs the link's href. The page I'm scraping is populated with image thumbnails, and when you click on the thumbnail it links to a full size version of the image. I'd like to grab the full size images.

The html looks somewhat like this:

<a href="example.com/full_size_image.jpg">
     <img src="example.com/image_thumbnail.jpg">
</a>

And I want to grab "example.com/full_size_image.jpg" .

My current method of doing so is

img_urls = scrapy.Selector(response).xpath('//a/img/..').xpath("@href").extract()

But I'd like to reduce that to a single xpath expression, as I plan to allow the user to enter their own xpath expression string.

You can check if an element has an another child element this way:

response.xpath('//a[img]/@href').extract()

Note that I'm using the response.xpath() shortcut and providing a single XPath expression.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM