How to scrape JavaScript rendered data from a website using Scrapy?

Question

Using Scrapy, I'm trying to scrape data of tag <script type="application/ld+json">....

import json

class TestSpider(scrapy.Spider):
    name = 'content'
    start_urls = ['https://www.maserati.com/us/en/models/ghibli']

    def parse(self, response):
        for content in response.xpath('(//script[@type="application/ld+json"])/text()'):
            data = json.loads(content)
            yield {
                'name': data['name'],
            }

        next_page = response.css('li.next a::attr("href")').get()
        if next_page is not None:
            yield response.follow(next_page, self.parse)

However, I'm not getting the test1.jl file that I was expecting once after writing, scrapy runspider test_spider.py - O test1.jl in the terminal

I just want the name for a start to know how it works.

Image and website link for inspection are given below:

Image that shows the javascript tag and the name property inside that I want to yield

Image of my code and the code in the terminal

https://www.maserati.com/us/en/models/ghibli

Answer 1

You were so close...just missing getall()

import scrapy
import json


class TestSpider(scrapy.Spider):
    name = 'content'
    start_urls = ['https://www.maserati.com/us/en/models/ghibli']

    def parse(self, response):
        for content in response.xpath('(//script[@type="application/ld+json"])/text()').getall():
            data = json.loads(content)
            yield {
                'name': data['name'],
            }

        next_page = response.css('li.next a::attr("href")').get()
        if next_page is not None:
            yield response.follow(next_page, self.parse)

(I don't see any "next" button though)

How to scrape JavaScript rendered data from a website using Scrapy?

Question

1 answers

solution1
0 2022-01-09 19:12:40

How to scrape JavaScript rendered data from a website using Scrapy?

Question

1 answers

solution1 0 2022-01-09 19:12:40

solution1
0 2022-01-09 19:12:40