I'm trying to get a table from coinmarketcap.com using the rvest
-package.
A solution approach is shown below. However, this one does not work anymore. The resulting table is empty. Apparently, the website has been changed somehow.
Can anyone provide a solution?
Many thanks in advance!
library(rvest)
library(tidyverse)
library(xml2)
url<- "https://coinmarketcap.com/currencies/bitcoin/historical-data/"
table <- url %>%
read_html()%>%
html_table() %>%
as.data.frame()
The webpage loads dynamically now. You thus need to use RSelenium
and not just rvest
.
This code works for me:
url<- "https://coinmarketcap.com/currencies/bitcoin/historical-data/"
# RSelenium with Firefox
rD <- RSelenium::rsDriver(browser="firefox", port=4546L, verbose=F)
remDr <- rD[["client"]]
remDr$navigate(url)
Sys.sleep(4)
# get the page source
web <- remDr$getPageSource()
web <- xml2::read_html(web[[1]])
table <- html_table(web) %>%
as.data.frame()
# close RSelenium
remDr$close()
gc()
rD$server$stop()
system("taskkill /im java.exe /f", intern=FALSE, ignore.stdout=FALSE)
You don't need to overhead of a browser. You can mimic the API call and parse the json response.
library(jsonlite)
library(tidyverse)
data <-jsonlite::read_json('https://web-api.coinmarketcap.com/v1/cryptocurrency/ohlcv/historical?id=1&convert=USD&time_start=1614297600&time_end=1619395200')$data$quotes
df <- map_df(data, function(x) {data.frame(x$quote)})
print(df)
# 1614297600 is Fri Feb 26 2021 00:00:00 GMT+0000 for 2021-02-27
# 1619395200 Mon Apr 26 2021 00:00:00 GMT+0000 for 2021-04-25
The time_start
and end_start
are unix timestamp with what looks like a day offset though you will need to explore how this works and whether offsets vary across bank holidays/weekends.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.