Splash-scrapy 無法呈現特定的 JavaScript web

Question

我正在嘗試使用 Scrapy 框架來抓取https://www.sreality.cz/en/search/for-sale/apartments網站。

Web 的部分代碼是用JavaScript編寫的，因此我嘗試使用 Splash Docker 容器為我提供html ，我可以輕松解析它。

我下載了 scrapinghub /splash Docker 圖像並在終端的 8050 端口啟動了它的容器。

% docker pull scrapinghub/splash

% docker run -p 8050:8050 scrapinghub/splash

在我的 scrapy 項目目錄中的settings.py文件中，我按照https://github.com/scrapy-plugins/scrapy-splash中的說明添加了這些代碼行。

SPLASH_URL = 'http://localhost:8050'

DOWNLOADER_MIDDLEWARES = {
    'scrapy_splash.SplashCookiesMiddleware': 723,
    'scrapy_splash.SplashMiddleware': 725,
    'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810,
}
SPIDER_MIDDLEWARES = {
    'scrapy_splash.SplashDeduplicateArgsMiddleware': 100,
}
DUPEFILTER_CLASS = 'scrapy_splash.SplashAwareDupeFilter'
HTTPCACHE_STORAGE = 'scrapy_splash.SplashAwareFSCacheStorage'

我在我的項目目錄中創建了一個新的蜘蛛。

import scrapy
from scrapy_splash import SplashRequest

class FlatSpider(scrapy.Spider):
    name = "flat"
    def start_requests(self):
        # sreality url
        url = 'https://www.sreality.cz/en/search/for-sale/apartments'

        # beer test url
        # url = 'https://www.beerwulf.com/en-gb/c/mixedbeercases'

        yield SplashRequest(url=url, callback=self.parse, args={'wait': 0.5})

    def parse(self, response):

        # sreality variable
        foo = response.css('span.name.ng-binding::text').get()

        # beer test variable
        # foo = response.css('h4.product-name::text').get()

        print(foo)

如果我在終端中使用% scrapy crawl flat運行這個蜘蛛，它會打印None即使它應該返回文本（我可以在 Chrome 檢查器中看到）。 但除此之外，一切似乎都有效。 如果我在代碼的兩行“啤酒測試”中發表評論，它會成功呈現 html 我可以解析並且代碼會在終端中打印文本。

此外，當我在http://localhost:8050中打開 Splash 並嘗試渲染 web https://www.sreality.cz/en/search/for-sale/apartments時，它似乎無法正常工作。 但是，它適用於不同的網絡。

出於某種原因，這個抓取解決方案不適用於我感興趣的這個特定的 web。我試圖弄清楚為什么以及如何從這個我可以輕松解析的 web 中獲得response.css 。

如果重要的話，我會在 macOS 13.0.1 Apple silicon 上運行它。

Answer 1

我之前嘗試過使用Splash ，但是 Splash 的社區不再活躍，有一個更好的插件來抓取交互式網站，它是scrapy-playwright 。

Splash-scrapy 無法呈現特定的 JavaScript web

問題描述

1 個解決方案

解決方案1
1 2022-12-31 12:52:56

Splash-scrapy 無法呈現特定的 JavaScript web

問題描述

1 個解決方案

解決方案1 1 2022-12-31 12:52:56

解決方案1
1 2022-12-31 12:52:56