无法使用样式化组件 javascript 抓取网站

Question

My goal我的目标

Get basic informations from this page with using Scrapy framework, but question is no specific to this framework.使用 Scrapy 框架从此页面获取基本信息，但问题不针对此框架。 Let's take the p element inside the h1 node for exemple.我们以h1节点内的p元素为例。

Issue问题

All the selections I make with the response I get from my Scrapy requests are failing to return what's inside the h1 node.我从 Scrapy 请求中获得的响应所做的所有选择都未能返回h1节点内的内容。

scrapy shell 'url'
response
>>> 200
response.xpath('//h1/p')
>>> []

Fetching the response: 获取响应：

When fetching the response, I see a structure i can't really understand with all the main html markup condensed and placed just after a bunch of javascript styled-components.在获取响应时，我看到一个我无法真正理解的结构，所有主要的 html 标记都压缩并放置在一堆 javascript 样式组件之后。 The file is here (ligne 1725). 文件在这里（1725 线）。

My process我的过程

Testing the selector from dev-tool: 从开发工具测试选择器：

After disabling Javascript from the dev tools and testing my selector, I get the desired result.从开发工具中禁用 Javascript并测试我的选择器后，我得到了想要的结果。 For exemple I get the <p> element inside the <h1> with a simple query //h1/p from the console.例如，我通过控制台的简单查询//h1/p获取<h1>内的<p>元素。

testing the selector with scrapy shell: 使用 scrapy shell 测试选择器：

Not working, see Issue不工作，请参阅问题

testing the selector with splash: 用 splash 测试选择器：

I get the exact same result as shown in the issue.我得到与问题中所示完全相同的结果。

Answer 1

I can't explain the error, but I can hopefull provide an answer to your problem我无法解释该错误，但我可以为您的问题提供答案

response.xpath('//*[@class="summary__StyledAddress-e4c4ok-6 zWwUF textIntent-title1"]/text()').get()

returns: '12-14 31st Avenue, Unit 2 '返回：'12-14 31st Avenue, Unit 2'

Which is hopefully what you need?希望哪一个是你需要的？

Dr P. P博士

无法使用样式化组件 javascript 抓取网站

问题描述

My goal我的目标

Issue问题

My process我的过程

1 个解决方案

解决方案1
1 已采纳 2020-12-15 17:17:10

无法使用样式化组件 javascript 抓取网站

问题描述

My goal我的目标

Issue问题

My process我的过程

1 个解决方案

解决方案1 1 已采纳 2020-12-15 17:17:10

解决方案1
1 已采纳 2020-12-15 17:17:10