简体   繁体   中英

Reading the content of a Javascript-rendered webpage into R

I am trying to read the content of the following webpage (as shown in the Inspect Element tool of my browser) into R:

Etoro Discover People

Since the content is apparently Javascript-rendered, it is not possible to retrieve content by using common web scraping functions like read_html from xml2 package. I have come across the following post that suggests using rvest and V8 packages, but I could not get it to work for my problem:

https://datascienceplus.com/scraping-javascript-rendered-web-content-using-r/

I have also seen very similar questions on Stack Overflow (like this and this ), but the answers to those questions (the hidden api solution and the Request URL in the Network tab) did not work for me.

For starters, I am interested in reading the public ID of people in the list (the div.user-nickname node). My guess is that either I am specifying the node incorrectly or the website does not allow web scraping at all.

Any help would be greatly appreciated.

Data is coming from an API call returning json. You can make the same GET request and then extract the usernames. Swop x$UserName with x$CustomerId for ids.

library(jsonlite)

data <- jsonlite::read_json('https://www.etoro.com/sapi/rankings/rankings/?activeweeksmin=24&blocked=false&bonusonly=false&copiersmax=5000&copyblock=false&copyinvestmentpctmax=0&copytradespctmax=0&dailyddmin=-10&displayfullname=true&gainmax=100&gainmin=5&hasavatar=true&highleveragepctmax=10&isfund=false&istestaccount=false&lastactivitymax=14&longpospctmax=80&lowleveragepctmin=50&maxdailyriskscoremax=5&maxmonthlyriskscoremax=5&maxmonthlyriskscoremin=1&optin=true&page=1&pagesize=20&period=OneYearAgo&profitableweekspctmin=50&sort=-gain&tradesmin=20&verified=true&weeklyddmin=-20&winratiomax=85')

users <- lapply(data$Items, function(x) {x$UserName})

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM