简体   繁体   English

Scrapy不遵循图片链接

[英]Scrapy don't follow links to images

Is there a way in Scrapy to not follow <a> tags pointing to images? 在Scrapy中,有没有一种方法可以不遵循<a>指向图像的标签?

For example: 例如:

<a href="http://jamsphere.com/wp-content/uploads/2015/11/Franki-Dennull-PROFILE.jpg">

My code at the moment: 我目前的代码:

for a in set(response.xpath('//a/@href')):
    yield scrapy.Request(url, callback=self.parse) 

Obviously I can add a hard coded check but was wondering if there is a built in option? 显然,我可以添加一个硬编码的检查,但想知道是否有内置选项?

Use a LinkExtractor , by default it filters out the common image / video / audio / file extensions. 使用LinkExtractor ,默认情况下,它会滤除常见的图像/视频/音频/文件扩展名。

Look here to see the ignored extensions. 在这里查看被忽略的扩展。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM