Scrapy不遵循图片链接

Question

Is there a way in Scrapy to not follow <a> tags pointing to images? 在Scrapy中，有没有一种方法可以不遵循<a>指向图像的标签？

For example: 例如：

<a href="http://jamsphere.com/wp-content/uploads/2015/11/Franki-Dennull-PROFILE.jpg">

My code at the moment: 我目前的代码：

for a in set(response.xpath('//a/@href')):
    yield scrapy.Request(url, callback=self.parse)

Obviously I can add a hard coded check but was wondering if there is a built in option? 显然，我可以添加一个硬编码的检查，但想知道是否有内置选项？

Answer 1

Use a LinkExtractor , by default it filters out the common image / video / audio / file extensions. 使用LinkExtractor ，默认情况下，它会滤除常见的图像/视频/音频/文件扩展名。

Look here to see the ignored extensions. 在这里查看被忽略的扩展。

Scrapy不遵循图片链接

问题描述

1 个解决方案

解决方案1
2 已采纳 2018-12-04 13:13:15

Scrapy不遵循图片链接

问题描述

1 个解决方案

解决方案1 2 已采纳 2018-12-04 13:13:15

解决方案1
2 已采纳 2018-12-04 13:13:15