繁体 English 中英

在 scrapy 中禁用图像下载的最佳方法是什么？

[英]What's the best way to disable image download in scrapy?

原文 2019-11-11 12:17:41 5 1 python/ python-3.x/ web-scraping/ scrapy

默认情况下不禁用它。

我写了一个蜘蛛，它每小时消耗近 2 GB 的数据。 现在我想保存我的数据消耗，图像对我没有用，所以要确保它们不会被获取。

鉴于这是一个 P0 场景，它应该是settings.py中的一个简单标志，但令人惊讶的是，我在文档中找不到任何标志。 我发现了很多关于ImagesPipeline的细节，启用了这些管道、它们的存储等，但对于对图像不感兴趣的人没有标记。 如果我遗漏了什么，请告诉我。

1 个解决方案

Scrapy 不会下载图像，除非您明确告诉它这样做。

您可以在运行时日志中查看 Scrapy 下载的 URL。 如果图像 URL 未出现在日志中，则即使下载了包含图像的网页，也不会下载该图像。

当您在 web 浏览器中打开下载的页面时，web 浏览器会即时下载图像。 它们不是来自下载的网页，它们不是（通常）嵌入网页中，网页指示它们在 Internet 中的位置，web 浏览器下载它们以显示它们，但 Scrapy 没有。

唯一的例外是图像实际上嵌入在 HTML 代码中，如 base64。 这是不常见的，可能不是你的情况。 发生这种情况时，您无法阻止他们的下载，您无法下载不包括部分内容的网页。

在scrapy中刮掉disqus评论计数的最佳方法是什么？

[英]What's the best way to scrape disqus comment count in scrapy?

使用 urllib3 下载文件的最佳方式是什么

[英]What's the best way to download file using urllib3

使用scrapy抓取多个域的最佳方法是什么？

[英]what is the best way to scrape multiple domains with scrapy?

Scrapy：在Postgres管道中使用itemloader的最佳方法是什么？

[英]Scrapy: what's the best way to use itemloader i.c.w. a Postgres Pipeline?

在许多标签中多次更改图像的最佳方法是什么？

[英]What's the best way to change an image multiple times in many labels?

设置用户帖子图片上传系统的最佳方法是什么？

[英]What's the best way to setup a user post image uploading system?

弧形/弯曲文本图像的最佳方式是什么？

[英]What's the best way to arc/bend a text image?

剪切图像中心部分的最佳方法是什么？

[英]What's the best way to cut out the center piece of an image?

使用 python 下载文件的最佳方法是什么

[英]What is the best way to download files using python

在bottle.py中禁用Jinja2模板缓存的最佳方法是什么？

[英]What's the best way to disable Jinja2 template caching in bottle.py?

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在scrapy中刮掉disqus评论计数的最佳方法是什么？使用 urllib3 下载文件的最佳方式是什么使用scrapy抓取多个域的最佳方法是什么？ Scrapy：在Postgres管道中使用itemloader的最佳方法是什么？在许多标签中多次更改图像的最佳方法是什么？设置用户帖子图片上传系统的最佳方法是什么？弧形/弯曲文本图像的最佳方式是什么？剪切图像中心部分的最佳方法是什么？使用 python 下载文件的最佳方法是什么在bottle.py中禁用Jinja2模板缓存的最佳方法是什么？

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM