[英]Web scraping with R - no HTML visible
I am trying to use R scrape a website: 我正在尝试使用R抓取网站:
http://divulgacandcontas.tse.jus.br/divulga/#/candidato/2018/2022802018/GO/90000609234 http://divulgacandcontas.tse.jus.br/divulga/#/candidato/2018/2022802018/GO/90000609234
It has several fields with lots of information. 它具有许多信息的领域。 I am only interested in the url above the field "site do candidato".
我只对“ site do candidato”字段上方的网址感兴趣。 In this example, the url I want is: " http://vanderlansenador111.com.br "
在此示例中,我想要的网址是:“ http://vanderlansenador111.com.br ”
The problem is, there is no HTML (visible). 问题是,没有HTML(可见)。 So, I don't think using rvest is helpful (at least, I don't know how to use it).
因此,我认为使用rvest没有帮助(至少,我不知道如何使用它)。 Is there a way to scrape it without using selenium (I never used Rselenium and had some problems trying to run it).
有没有一种方法可以在不使用硒的情况下进行刮擦(我从未使用过Rselenium,并且在尝试运行它时遇到了一些问题)。
Points to any direction much appreciated. 指向任何赞赏的方向。
Don't waste your time with Selenium. 不要浪费您的硒时间。 Use the Developer Tools part of your browser to find the XHR request: http://divulgacandcontas.tse.jus.br/divulga/rest/v1/candidatura/buscar/2018/GO/2022802018/candidato/90000609234
使用浏览器的开发人员工具部分查找XHR请求: http : //divulgacandcontas.tse.jus.br/divulga/rest/v1/candidatura/buscar/2018/GO/2022802018/candidato/90000609234
and just use jsonlite::fromJSON()
: 只需使用
jsonlite::fromJSON()
:
str(jsonlite::fromJSON("http://divulgacandcontas.tse.jus.br/divulga/rest/v1/candidatura/buscar/2018/GO/2022802018/candidato/90000609234"))
The str()
output is large & complete. str()
输出大而完整。 You should be able to find what you need there. 您应该能够在那里找到所需的东西。
Selenium is a good choice for this, and alternative is you can use PhantomJS there is a good tutorial on the process over at datacamp (not as clean solution as Selenium) Selenium是一个不错的选择,另外一种选择是您可以使用PhantomJS在datacamp上有一个很好的关于过程的教程(不像Selenium那样干净)
https://www.datacamp.com/community/tutorials/scraping-javascript-generated-data-with-r https://www.datacamp.com/community/tutorials/scraping-javascript-generated-data-with-r
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.