简体   繁体   中英

I want to loop over a dataframe containing urls using rvest in r

First i scrape a certain amount of urls from a website and collect them into a dataframe. However i want to loop over the urls which i collected into the dataframe. This is my code:

library(rvest)library(dplyr)
library(XLConnect)
##########GET URLS###################################################################################
urls <- read_html("http://www.klassiekshop.nl/labels/labels-a-e/brilliant-classics/?limit=all")

urls <- urls %>% 
  html_nodes(".product-name a") %>% 
  html_attr("href") %>%
  as.character()

url <- as.data.frame(urls)
as.character(url$urls)


#########EXTRACT URLS FROM DATAFRAME URLS############################################################
#########CREATE DATAFRAME############################################################################
EAN <- 0
price <- 0

df <- data.frame(EAN, price)

#########GET DATA####################################################################################
pricing_data <- for(i in urls){

site <-read_html(i)
print(i)
  stats <- data.frame(EAN =site %>% html_node("b") %>% html_text() ,
               price =site %>% html_node(".price") %>% html_text() ,
               stringsAsFactors=FALSE)
 data <-rbind(df,stats)
}

When debugging the loop runs over the urls. However it doesn't collect the data. Does anyone know how to get the data from the site?

Thanks!

这是因为您正在将df rbindstats ……但您从未更改过df ...我想您想将代码的最后一行更改为: df <-rbind(df,stats)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM