简体   繁体   English

将 Javascript 呈现的网页内容读入 R

[英]Reading the content of a Javascript-rendered webpage into R

I am trying to read the content of the following webpage (as shown in the Inspect Element tool of my browser) into R:我正在尝试将以下网页的内容(如浏览器的 Inspect Element 工具中所示)读入 R:

Etoro Discover People Etoro 发现人

Since the content is apparently Javascript-rendered, it is not possible to retrieve content by using common web scraping functions like read_html from xml2 package.由于内容显然是 Javascript 渲染的,因此无法使用常见的 web 抓取函数(如来自xml2 read_html的 read_html)来检索内容。 I have come across the following post that suggests using rvest and V8 packages, but I could not get it to work for my problem:我遇到了以下建议使用rvestV8包的帖子,但我无法解决我的问题:

https://datascienceplus.com/scraping-javascript-rendered-web-content-using-r/ https://datascienceplus.com/scraping-javascript-rendered-web-content-using-r/

I have also seen very similar questions on Stack Overflow (like this and this ), but the answers to those questions (the hidden api solution and the Request URL in the Network tab) did not work for me.我在 Stack Overflow 上也看到了非常相似的问题(比如thisthis ),但是这些问题的答案(隐藏的 api 解决方案和 Network 选项卡中的 Request URL )对我不起作用。

For starters, I am interested in reading the public ID of people in the list (the div.user-nickname node).对于初学者,我有兴趣阅读列表中人员的公共 ID( div.user-nickname节点)。 My guess is that either I am specifying the node incorrectly or the website does not allow web scraping at all.我的猜测是,要么我错误地指定了节点,要么网站根本不允许 web 抓取。

Any help would be greatly appreciated.任何帮助将不胜感激。

Data is coming from an API call returning json.数据来自返回 json 的 API 调用。 You can make the same GET request and then extract the usernames.您可以发出相同的 GET 请求,然后提取用户名。 Swop x$UserName with x$CustomerId for ids.x$UserNamex$CustomerId交换为 id。

library(jsonlite)

data <- jsonlite::read_json('https://www.etoro.com/sapi/rankings/rankings/?activeweeksmin=24&blocked=false&bonusonly=false&copiersmax=5000&copyblock=false&copyinvestmentpctmax=0&copytradespctmax=0&dailyddmin=-10&displayfullname=true&gainmax=100&gainmin=5&hasavatar=true&highleveragepctmax=10&isfund=false&istestaccount=false&lastactivitymax=14&longpospctmax=80&lowleveragepctmin=50&maxdailyriskscoremax=5&maxmonthlyriskscoremax=5&maxmonthlyriskscoremin=1&optin=true&page=1&pagesize=20&period=OneYearAgo&profitableweekspctmin=50&sort=-gain&tradesmin=20&verified=true&weeklyddmin=-20&winratiomax=85')

users <- lapply(data$Items, function(x) {x$UserName})

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM