[英]How do I obtain data, with scrapy, in a web page in which I do not see that there is the code I want to scrape
I'm trying to get the names of the users and the content of the comments that exist on this page : 我正在尝试获取用户的名称和此页面上存在的评论的内容:
User and text that I need to extract: 我需要提取的用户和文本:
When I test the extraction with the chrome plugin Xpath helper , I am getting the user names with the statement: 当我使用chrome插件Xpath helper测试提取时,我正在使用以下语句获取用户名:
//*[@id="livefyre"]/div/div/div/div/article/div/header/a/span
and the comments, I get them with: 和评论,我得到他们:
//*[@id="livefyre"]/div/div/div/div/article/div/section/div/p
When I do the test in the scrapy console, with the query: 当我在scrapy控制台中执行测试时,出现以下查询:
response.xpath(//*[@id="livefyre"]/div/div/div/div/article/div/section/div/p).extract()
I get a [] ; 我得到一个[] ;
I've also tried with: 我也尝试过:
response.xpath (//*[@id="livefyre"]/div/div/div/div/article/div/section/div/p.text()).extract()
The same thing happens with my code. 我的代码也发生了同样的事情。
Verifying the code of the page, I see that all those comments do not exist in the html code. 验证页面的代码后,我发现html代码中不存在所有这些注释。
When I inspect the page, for example, I see the comment text: 例如,当我检查页面时,会看到注释文本:
But when, I check the html code of the page I do not see anything : 但是,当我检查页面的html代码时,我什么都没有看到:
Where am I making a mistake? 我在哪里出错?
Thanks for help. 感谢帮助。
As you stated, there isn't any comment in the code of page, that mean website is being rendered through javascript, There are two ways you can scrap these kind of websites 如您所述,页面代码中没有任何注释,这意味着网站是通过javascript呈现的。有两种方法可以删除此类网站
First, 第一,
use scrapy-splash
to render javascript 使用
scrapy-splash
呈现javascript
second, 第二,
find the api/network call
that brings the comments, mock that request in scrapy to get your data. 找到带来评论的
api/network call
,草率地模拟该请求以获取您的数据。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.