First i scrape a certain amount of urls from a website and collect them into a dataframe. However i want to loop over the urls which i collected into the dataframe. This is my code:
library(rvest)library(dplyr)
library(XLConnect)
##########GET URLS###################################################################################
urls <- read_html("http://www.klassiekshop.nl/labels/labels-a-e/brilliant-classics/?limit=all")
urls <- urls %>%
html_nodes(".product-name a") %>%
html_attr("href") %>%
as.character()
url <- as.data.frame(urls)
as.character(url$urls)
#########EXTRACT URLS FROM DATAFRAME URLS############################################################
#########CREATE DATAFRAME############################################################################
EAN <- 0
price <- 0
df <- data.frame(EAN, price)
#########GET DATA####################################################################################
pricing_data <- for(i in urls){
site <-read_html(i)
print(i)
stats <- data.frame(EAN =site %>% html_node("b") %>% html_text() ,
price =site %>% html_node(".price") %>% html_text() ,
stringsAsFactors=FALSE)
data <-rbind(df,stats)
}
When debugging the loop runs over the urls. However it doesn't collect the data. Does anyone know how to get the data from the site?
Thanks!
这是因为您正在将df
rbind
到stats
……但您从未更改过df
...我想您想将代码的最后一行更改为: df <-rbind(df,stats)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.