简体   繁体   English

使用 Scrapy 和 Python 我无法下载图像

[英]Using Scrapy with Python I fail to download images

I'm trying to scrape a few images from a website.我正在尝试从网站上抓取一些图像。 Sorry in advance, I am not very experimented with Python and it is the first time I try using scrapy.提前抱歉,我对 Python 的实验并不多,这是我第一次尝试使用 scrapy。

I manage apparently to get all the images I need, but they somehow get lost and my output folder remains empty .我显然设法获得了我需要的所有图像,但它们不知何故迷路了,我的 output 文件夹仍然为空

I looked at a few tutorials and all the similar questions I could find on SO, but nothing seemed to really work out.我查看了一些教程和我可以在 SO 上找到的所有类似问题,但似乎没有任何效果。

My spider:我的蜘蛛:

from testspider.items import TestspiderItem
import datetime
import scrapy

class PageSpider(scrapy.Spider):
    
    name = 'page-spider'
    start_urls = ['http://scan-vf.co/one_piece/chapitre-807/1']

    def parse(self, response):
        SET_SELECTOR = '.img-responsive'
        page = 1
        
        for imgPage in response.css(SET_SELECTOR):
            IMAGE_SELECTOR = 'img ::attr(src)'

            imgURL = imgPage.css(IMAGE_SELECTOR).extract_first()
            title = 'op-807-' + str(page)

            page += 1

            yield TestspiderItem({'title':title, 'image_urls':[imgURL]})

My items:我的物品:

import scrapy

class TestspiderItem(scrapy.Item):

    title = scrapy.Field()
    image_urls = scrapy.Field()
    images = scrapy.Field()

My settings:我的设置:

BOT_NAME = 'testspider'
SPIDER_MODULES = ['testspider.spiders']
NEWSPIDER_MODULE = 'testspider.spiders'
DEFAULT_ITEM_CLASS = 'testspider.items'
ROBOTSTXT_OBEY = True
ITEM_PIPELINES = {
    'scrapy.pipelines.images.ImagesPipeline': 1,
}
IMAGE_STORE = '/home/*******/documents/testspider/output'

If you could be so kind as to help me understanding what's missing / what's incorrect, I would be grateful如果您能帮助我了解缺少什么/不正确的地方,我将不胜感激

If you check a source code (usually Ctrl+U in a browser) you'll find that each img is a something like this:如果您检查源代码(通常在浏览器中Ctrl+U ),您会发现每个img都是这样的:

<img class="img-responsive" src="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7" data-src=' https://www.scan-vf.co/uploads/manga/one_piece/chapters/chapitre-807/01.jpg ' alt='One Piece: Chapter chapitre-807 - Page 1'/>

As you can see you need to use data-src in your code instead of src :如您所见,您需要在代码中使用data-src而不是src

IMAGE_SELECTOR = 'img ::attr(data-src)'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM