如何在没有看到要刮擦的代码的网页中以刮擦的方式获取数据

Question

I'm trying to get the names of the users and the content of the comments that exist on this page : 我正在尝试获取用户的名称和此页面上存在的评论的内容：

User and text that I need to extract: 我需要提取的用户和文本：

When I test the extraction with the chrome plugin Xpath helper , I am getting the user names with the statement: 当我使用chrome插件Xpath helper测试提取时，我正在使用以下语句获取用户名：

//*[@id="livefyre"]/div/div/div/div/article/div/header/a/span

and the comments, I get them with: 和评论，我得到他们：

//*[@id="livefyre"]/div/div/div/div/article/div/section/div/p

When I do the test in the scrapy console, with the query: 当我在scrapy控制台中执行测试时，出现以下查询：

response.xpath(//*[@id="livefyre"]/div/div/div/div/article/div/section/div/p).extract()

I get a [] ; 我得到一个[] ；

I've also tried with: 我也尝试过：

response.xpath (//*[@id="livefyre"]/div/div/div/div/article/div/section/div/p.text()).extract()

The same thing happens with my code. 我的代码也发生了同样的事情。

Verifying the code of the page, I see that all those comments do not exist in the html code. 验证页面的代码后，我发现html代码中不存在所有这些注释。

When I inspect the page, for example, I see the comment text: 例如，当我检查页面时，会看到注释文本：

But when, I check the html code of the page I do not see anything : 但是，当我检查页面的html代码时，我什么都没有看到：

Where am I making a mistake? 我在哪里出错？

Thanks for help. 感谢帮助。

Answer 1

As you stated, there isn't any comment in the code of page, that mean website is being rendered through javascript, There are two ways you can scrap these kind of websites 如您所述，页面代码中没有任何注释，这意味着网站是通过javascript呈现的。有两种方法可以删除此类网站

First, 第一，

use scrapy-splash to render javascript 使用scrapy-splash呈现javascript

second, 第二，

find the api/network call that brings the comments, mock that request in scrapy to get your data. 找到带来评论的api/network call ，草率地模拟该请求以获取您的数据。

如何在没有看到要刮擦的代码的网页中以刮擦的方式获取数据

问题描述

1 个解决方案

解决方案1
2 2019-01-02 18:01:12

如何在没有看到要刮擦的代码的网页中以刮擦的方式获取数据

问题描述

1 个解决方案

解决方案1 2 2019-01-02 18:01:12

解决方案1
2 2019-01-02 18:01:12