簡體   English   中英

我無法抓取數據

[英]I'm not able to scrape data

我正在使用scrapy從網站上抓取數據。 這是我的代碼

import scrapy


class ShopSpider(scrapy.Spider):
    name = 'shop'
    allowed_domains = ['https://www.shopclues.com/mobiles-smartphones.html?sort_by=bestsellers']
    start_urls = ['http://https://www.shopclues.com/mobiles-smartphones.html?sort_by=bestsellers/']
    custom_settings = {
        'FEED_URI': 'tmp/shop.csv'
    }

    def parse(self, response):

        titles = response.css('img::attr(title)').extract()
        images = response.css('img::attr(data-img)').extract()
        prices = response.css('.p_price::text').extract()
        discounts = response.css('.prd_discount::text').extract()

        for item in zip(titles, prices, images, discounts):
            scraped_info = {
                'title': item[0],
                'price': item[1],
                'image_urls': [item[2]],  # Set's the url for scrapy to download images
                'discount': item[3]
        }

        yield scraped_info

請檢查我哪里做錯了? 另外,我想在滾動時抓取所有數據。 那么它應該在我們滾動之前獲取所有數據嗎? 那么我該怎么做呢?

您有以下問題:

  1. 不正確的allowed_domain (只需要域);
  2. 損壞的start_urls (http 兩次,最后是斜線);
  3. 錯誤打算在parse函數中產生項目。

在此處檢查固定代碼:

import scrapy

class ShopSpider(scrapy.Spider):
    name = 'shop'
    allowed_domains = ['shopclues.com']
    start_urls = ['https://www.shopclues.com/mobiles-smartphones.html?sort_by=bestsellers']

    def parse(self, response):
        titles = response.css('img::attr(title)').extract()
        images = response.css('img::attr(data-img)').extract()
        prices = response.css('.p_price::text').extract()
        discounts = response.css('.prd_discount::text').extract()

        for item in zip(titles, prices, images, discounts):
            scraped_info = {
                'title': item[0],
                'price': item[1],
                'image_urls': [item[2]],  # Set's the url for scrapy to download images
                'discount': item[3]
            }

            yield scraped_info

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM