使用R进行网页抓取-看不到HTML

Question

I am trying to use R scrape a website: 我正在尝试使用R抓取网站：

http://divulgacandcontas.tse.jus.br/divulga/#/candidato/2018/2022802018/GO/90000609234 http://divulgacandcontas.tse.jus.br/divulga/#/candidato/2018/2022802018/GO/90000609234

It has several fields with lots of information. 它具有许多信息的领域。 I am only interested in the url above the field "site do candidato". 我只对“ site do candidato”字段上方的网址感兴趣。 In this example, the url I want is: " http://vanderlansenador111.com.br " 在此示例中，我想要的网址是：“ http://vanderlansenador111.com.br ”

The problem is, there is no HTML (visible). 问题是，没有HTML（可见）。 So, I don't think using rvest is helpful (at least, I don't know how to use it). 因此，我认为使用rvest没有帮助（至少，我不知道如何使用它）。 Is there a way to scrape it without using selenium (I never used Rselenium and had some problems trying to run it). 有没有一种方法可以在不使用硒的情况下进行刮擦（我从未使用过Rselenium，并且在尝试运行它时遇到了一些问题）。

Points to any direction much appreciated. 指向任何赞赏的方向。

Answer 1

Don't waste your time with Selenium. 不要浪费您的硒时间。 Use the Developer Tools part of your browser to find the XHR request: http://divulgacandcontas.tse.jus.br/divulga/rest/v1/candidatura/buscar/2018/GO/2022802018/candidato/90000609234 使用浏览器的开发人员工具部分查找XHR请求： http : //divulgacandcontas.tse.jus.br/divulga/rest/v1/candidatura/buscar/2018/GO/2022802018/candidato/90000609234

and just use jsonlite::fromJSON() : 只需使用jsonlite::fromJSON() ：

str(jsonlite::fromJSON("http://divulgacandcontas.tse.jus.br/divulga/rest/v1/candidatura/buscar/2018/GO/2022802018/candidato/90000609234"))

The str() output is large & complete. str()输出大而完整。 You should be able to find what you need there. 您应该能够在那里找到所需的东西。

Answer 2

Selenium is a good choice for this, and alternative is you can use PhantomJS there is a good tutorial on the process over at datacamp (not as clean solution as Selenium) Selenium是一个不错的选择，另外一种选择是您可以使用PhantomJS在datacamp上有一个很好的关于过程的教程（不像Selenium那样干净）

https://www.datacamp.com/community/tutorials/scraping-javascript-generated-data-with-r https://www.datacamp.com/community/tutorials/scraping-javascript-generated-data-with-r

使用R进行网页抓取-看不到HTML

问题描述

2 个解决方案

解决方案1
3 已采纳 2018-08-25 02:50:51

解决方案2
1 2018-08-25 02:08:36

使用R进行网页抓取-看不到HTML

问题描述

2 个解决方案

解决方案1 3 已采纳 2018-08-25 02:50:51

解决方案2 1 2018-08-25 02:08:36

解决方案1
3 已采纳 2018-08-25 02:50:51

解决方案2
1 2018-08-25 02:08:36