Scrapy gets only 24 first items of page

Question

I tried many ways to scrape ikea page and I figured out that at last page ikea actually shows all the items. But when I try to scrape last page of ikea's product it only returns me the 24 first items (which corresponds to the items displayed for the first page. this is the URL of the page: https://www.ikea.com/fr/fr/cat/lits-bm003/?page=12

and this is the spider:

import scrapy
import pprint

class SpiderSpider(scrapy.Spider):
    name = 'Ikea'
    pages = 9
    start_urls = ['https://www.ikea.com/fr/fr/cat/canapes-fu003/?page=12']

    def parse(self, response):
        data = {}
        products = response.css('div.plp-product-list')
        for product in products:
            for p in product.css('div.range-revamp-product-compact'):
                yield {
                    'Title' : p.css('div.range-revamp-header-section__title--small::text').getall()[0],
                    'Price' : p.css('span.range-revamp-price__integer::text').getall()[0],
                    'Desc' : p.css('span.range-revamp-header-section__description-text::text').getall()[0],
                    'Img' : p.css('img.range-revamp-aspect-ratio-image__image::attr(src)').getall()[0]
                }

Answer 1

Scrapy's spider doesn't run JavaScript (that's the job of a browser), it will only load the same response content as a cURL would.

To do what exactly you suggest, you need a browser-based solution, like Selenium (Python) or Cypress (JavaScript). Either that or go through each page separately. Try to use a 'headless browser'.

There are probably better ways of doing this, but to address your exact question, this is the intended answer.

Scrapy gets only 24 first items of page

Question

1 answers

solution1
0 ACCPTED 2020-12-01 16:11:18

Scrapy gets only 24 first items of page

Question

1 answers

solution1 0 ACCPTED 2020-12-01 16:11:18

solution1
0 ACCPTED 2020-12-01 16:11:18