I am downloading a list of url which becomes a list. To run the list I use a loop. during the loop I use cbind to append the results to a data.frame. The way I mad it work does not seem the best way. I am wondering what other ways to accomplish this. To make the rbind work I had to take a copy of the file sturcture and use it as a blank template. There has to be a more R way to do this, so before I run the loop I run it once to get structure: final <- final[1,]
library(stringr)
library(gdata)
library(XML)
# get the files for department of revenue OFM
url = "http://dor.wa.gov/Content/AboutUs/StatisticsAndReports/stats_taxretail.aspx"
# use xml to get the names of the files that are xls and xlsx that have data
links = htmlParse(url)
src = xpathApply(links, "//a[@href]", xmlGetAttr, "href")
xls.src = src[grep(".xls", src, fixed=T)]
# xls.src = xls.src[1:3] # testing size
base = "http://dor.wa.gov"
for (i in seq(xls.src)){
url = paste0(base, xls.src[[i]])
download.file(url, destfile="file.xls")
retail <- read.xls("file.xls", header=TRUE)
names(retail) <- tolower(names(retail))
retail <- retail[complete.cases(retail$location),c(1,2, 5, 6)]
retail$year <- paste0(unlist(str_extract_all(url, "\\(?[0-9]")), collapse="")
names(retail)[3:4] <- c("firms", "taxable sales")
final = rbind(final, retail) # final starts here with 1 row of dummy data
}
# this removes the first
wa.retail <- final[-1, ]
Rather than doing a for
loop, use lapply
to generate a list of data.frames
. Then you can rbind
them all at the end with do.call
. Here's a sketch
dfs <- lapply(xls.src, function(src) {
download.file(src, destfile="file.xls")
read.xls("file.xls", header=TRUE)
})
final <- do.call(rbind, dfs)
Here dfs
will be a list of data.frames
generated by each call to read.xls
. You can add back in all the data cleaning, but this is generally a better strategy.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.