[英]Scrapy don't follow links to images
Is there a way in Scrapy to not follow <a>
tags pointing to images? 在Scrapy中,有没有一种方法可以不遵循
<a>
指向图像的标签?
For example: 例如:
<a href="http://jamsphere.com/wp-content/uploads/2015/11/Franki-Dennull-PROFILE.jpg">
My code at the moment: 我目前的代码:
for a in set(response.xpath('//a/@href')):
yield scrapy.Request(url, callback=self.parse)
Obviously I can add a hard coded check but was wondering if there is a built in option? 显然,我可以添加一个硬编码的检查,但想知道是否有内置选项?
Use a LinkExtractor , by default it filters out the common image / video / audio / file extensions. 使用LinkExtractor ,默认情况下,它会滤除常见的图像/视频/音频/文件扩展名。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.