[英]Using Scrapy with Python I fail to download images
I'm trying to scrape a few images from a website.我正在尝试从网站上抓取一些图像。 Sorry in advance, I am not very experimented with Python and it is the first time I try using scrapy.
提前抱歉,我对 Python 的实验并不多,这是我第一次尝试使用 scrapy。
I manage apparently to get all the images I need, but they somehow get lost and my output folder remains empty .我显然设法获得了我需要的所有图像,但它们不知何故迷路了,我的 output 文件夹仍然为空。
I looked at a few tutorials and all the similar questions I could find on SO, but nothing seemed to really work out.我查看了一些教程和我可以在 SO 上找到的所有类似问题,但似乎没有任何效果。
My spider:我的蜘蛛:
from testspider.items import TestspiderItem
import datetime
import scrapy
class PageSpider(scrapy.Spider):
name = 'page-spider'
start_urls = ['http://scan-vf.co/one_piece/chapitre-807/1']
def parse(self, response):
SET_SELECTOR = '.img-responsive'
page = 1
for imgPage in response.css(SET_SELECTOR):
IMAGE_SELECTOR = 'img ::attr(src)'
imgURL = imgPage.css(IMAGE_SELECTOR).extract_first()
title = 'op-807-' + str(page)
page += 1
yield TestspiderItem({'title':title, 'image_urls':[imgURL]})
My items:我的物品:
import scrapy
class TestspiderItem(scrapy.Item):
title = scrapy.Field()
image_urls = scrapy.Field()
images = scrapy.Field()
My settings:我的设置:
BOT_NAME = 'testspider'
SPIDER_MODULES = ['testspider.spiders']
NEWSPIDER_MODULE = 'testspider.spiders'
DEFAULT_ITEM_CLASS = 'testspider.items'
ROBOTSTXT_OBEY = True
ITEM_PIPELINES = {
'scrapy.pipelines.images.ImagesPipeline': 1,
}
IMAGE_STORE = '/home/*******/documents/testspider/output'
If you could be so kind as to help me understanding what's missing / what's incorrect, I would be grateful如果您能帮助我了解缺少什么/不正确的地方,我将不胜感激
If you check a source code (usually Ctrl+U
in a browser) you'll find that each img
is a something like this:如果您检查源代码(通常在浏览器中
Ctrl+U
),您会发现每个img
都是这样的:
<img class="img-responsive" src="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7" data-src=' https://www.scan-vf.co/uploads/manga/one_piece/chapters/chapitre-807/01.jpg ' alt='One Piece: Chapter chapitre-807 - Page 1'/>
As you can see you need to use data-src
in your code instead of src
:如您所见,您需要在代码中使用
data-src
而不是src
:
IMAGE_SELECTOR = 'img ::attr(data-src)'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.