[英]Unable to scrape a website with styled-component javascript
Get basic informations from this page with using Scrapy framework, but question is no specific to this framework.使用 Scrapy 框架从此页面获取基本信息,但问题不针对此框架。 Let's take the
p
element inside the h1
node for exemple.我们以
h1
节点内的p
元素为例。
All the selections I make with the response I get from my Scrapy requests are failing to return what's inside the h1
node.我从 Scrapy 请求中获得的响应所做的所有选择都未能返回
h1
节点内的内容。
scrapy shell 'url'
response
>>> 200
response.xpath('//h1/p')
>>> []
Fetching the response:
When fetching the response, I see a structure i can't really understand with all the main html markup condensed and placed just after a bunch of javascript styled-components.在获取响应时,我看到一个我无法真正理解的结构,所有主要的 html 标记都压缩并放置在一堆 javascript 样式组件之后。 The file is here (ligne 1725).
文件在这里(1725 线)。
After disabling Javascript from the dev tools and testing my selector, I get the desired result.从开发工具中禁用 Javascript并测试我的选择器后,我得到了想要的结果。 For exemple I get the
<p>
element inside the <h1>
with a simple query //h1/p
from the console.例如,我通过控制台的简单查询
//h1/p
获取<h1>
内的<p>
元素。
Not working, see Issue不工作,请参阅问题
I get the exact same result as shown in the issue.我得到与问题中所示完全相同的结果。
I can't explain the error, but I can hopefull provide an answer to your problem我无法解释该错误,但我可以为您的问题提供答案
response.xpath('//*[@class="summary__StyledAddress-e4c4ok-6 zWwUF textIntent-title1"]/text()').get()
returns: '12-14 31st Avenue, Unit 2 '返回:'12-14 31st Avenue, Unit 2'
Which is hopefully what you need?希望哪一个是你需要的?
Dr P. P博士
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.