简体   繁体   中英

How to store results from a loop for webscraping using rvest in R

I'm trying to import a database from the same website but in different tabs.

# webscraping para idh

algo <- c(1996:2017)

idh_link <- c(paste0("https://datosmacro.expansion.com/idh?anio=", 1996:2017))
final <- vector(length = length(idh_link))

for (i in seq_along(algo)) {
idh_desc <- read_html(idh_link[i])

pais <- idh_desc %>% 
  html_nodes("td:nth-child(1), .header:nth-child(1)") %>% 
  html_text()

idhaño <- idh_desc %>% 
  html_nodes("td:nth-child(2), .header:nth-child(2)") %>% 
  html_text()

final[i] <- tibble(pais, idhaño)
}

In this case, it only recovers the information from the first link and doesn't create the tibble at the end of the loop (the idea is to do a innerjoin with all the tibbles).

I'm using library(rvest) for the webscraping

Vectors are not able to store data.frames/tibbles. Vectors can only store atomic objects, such as integers, character strings, etc.

To store a series of data frames it is best to use a list.

algo <- c(1996:2017)

idh_link <- c(paste0("https://datosmacro.expansion.com/idh?anio=", 1996:2017))
#data structure to store a series of data frames
final <- list()

for (i in seq_along(algo)) {
   idh_desc <- read_html(idh_link[i])
   
   pais <- idh_desc %>% 
      html_nodes("td:nth-child(1), .header:nth-child(1)") %>% 
      html_text()
   
   idhaño <- idh_desc %>% 
      html_nodes("td:nth-child(2), .header:nth-child(2)") %>% 
      html_text()
   
   #name the list elements with the year information
   final[[as.character(algo[i])]] <- tibble(pais, idhaño)

   #add a pause so not to "attack" the server
   Sys.sleep(1)
}

To combine all of the data frame stored in the list, I would recommend either the bind_rows() or bind_cols() from the dplyr package.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM