如何使用 Scrapy 从网站抓取 JavaScript 呈现的数据？

Question

使用 Scrapy，我正在尝试抓取标签<script type="application/ld+json">....的数据。

import json

class TestSpider(scrapy.Spider):
    name = 'content'
    start_urls = ['https://www.maserati.com/us/en/models/ghibli']

    def parse(self, response):
        for content in response.xpath('(//script[@type="application/ld+json"])/text()'):
            data = json.loads(content)
            yield {
                'name': data['name'],
            }

        next_page = response.css('li.next a::attr("href")').get()
        if next_page is not None:
            yield response.follow(next_page, self.parse)

但是，我在编写后没有得到我期望的 test1.jl 文件，scrapy 在终端scrapy runspider test_spider.py - O test1.jl

我只是想知道它是如何工作的。

供检查的图像和网站链接如下：

显示 javascript 标记和我想要生成的名称属性的图像

我的代码和终端中的代码的图像

https://www.maserati.com/us/en/models/ghibli

Answer 1

你是如此接近......只是错过了getall()

import scrapy
import json


class TestSpider(scrapy.Spider):
    name = 'content'
    start_urls = ['https://www.maserati.com/us/en/models/ghibli']

    def parse(self, response):
        for content in response.xpath('(//script[@type="application/ld+json"])/text()').getall():
            data = json.loads(content)
            yield {
                'name': data['name'],
            }

        next_page = response.css('li.next a::attr("href")').get()
        if next_page is not None:
            yield response.follow(next_page, self.parse)

（虽然我没有看到任何“下一步”按钮）

如何使用 Scrapy 从网站抓取 JavaScript 呈现的数据？

问题描述

1 个解决方案

解决方案1
0 2022-01-09 19:12:40

如何使用 Scrapy 从网站抓取 JavaScript 呈现的数据？

问题描述

1 个解决方案

解决方案1 0 2022-01-09 19:12:40

解决方案1
0 2022-01-09 19:12:40