简体   繁体   中英

R append to data.frame during loop from list variables

I am downloading a list of url which becomes a list. To run the list I use a loop. during the loop I use cbind to append the results to a data.frame. The way I mad it work does not seem the best way. I am wondering what other ways to accomplish this. To make the rbind work I had to take a copy of the file sturcture and use it as a blank template. There has to be a more R way to do this, so before I run the loop I run it once to get structure: final <- final[1,]

library(stringr)
library(gdata)
library(XML)

# get the files for department of revenue  OFM       

url = "http://dor.wa.gov/Content/AboutUs/StatisticsAndReports/stats_taxretail.aspx"

# use xml to get the names of the files that are xls and xlsx that have data
links = htmlParse(url)
src = xpathApply(links, "//a[@href]", xmlGetAttr, "href")
xls.src = src[grep(".xls", src, fixed=T)]
# xls.src = xls.src[1:3] # testing size

base = "http://dor.wa.gov" 
for (i in seq(xls.src)){
  url = paste0(base, xls.src[[i]])
  download.file(url, destfile="file.xls")
  retail <- read.xls("file.xls", header=TRUE)
  names(retail) <- tolower(names(retail))
  retail <- retail[complete.cases(retail$location),c(1,2, 5, 6)]
  retail$year <- paste0(unlist(str_extract_all(url, "\\(?[0-9]")), collapse="")
  names(retail)[3:4] <- c("firms", "taxable sales")
  final = rbind(final, retail) # final starts here with 1 row of dummy data
}
# this removes the first
wa.retail <- final[-1, ]

Rather than doing a for loop, use lapply to generate a list of data.frames . Then you can rbind them all at the end with do.call . Here's a sketch

dfs <- lapply(xls.src, function(src) {
    download.file(src, destfile="file.xls")
    read.xls("file.xls", header=TRUE)
})
final <- do.call(rbind, dfs)

Here dfs will be a list of data.frames generated by each call to read.xls . You can add back in all the data cleaning, but this is generally a better strategy.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM