简体   繁体   中英

Loading multiple files into R at the same time (with similar file names)

I am trying to load in multiple files into an R environment, I have tried something like the following;

files <- list.files(pattern = ".Rda", recursive = TRUE)

lapply(files,load,.GlobalEnv)

Which only loads in one data file (incorrectly). The problem I am finding is that all the files have the same names across each years. For example "Year1/beer/beer.Rda" has also "Year2/beer/beer.Rda" .

I am trying to rename the data files upon import so beer1 and beer2 will correspond to beer year 1 and beer year 2 etc.

Anybody have a better method of loading in the data? I have more than 2 years worth of data.

File names:

 [1] "Year1/beer/beer.Rda"         "Year1/blades/blades.Rda"     "Year1/carbbev/carbbev.Rda"  
 [4] "Year1/cigets/cigets.Rda"     "Year1/coffee/coffee.Rda"     "Year1/coldcer/coldcer.Rda"  
 [7] "Year1/deod/deod.Rda"         "Year1/diapers/diapers.Rda"   "Year1/factiss/factiss.Rda"  
[10] "Year1/fzdinent/fzdinent.Rda" "Year1/fzpizza/fzpizza.Rda"   "Year1/hhclean/hhclean.Rda"  
[13] "Year1/hotdog/hotdog.Rda"     "Year1/laundet/laundet.Rda"   "Year1/margbutr/margbutr.Rda"
[16] "Year1/mayo/mayo.Rda"         "Year1/milk/milk.Rda"         "Year1/mustketc/mustketc.Rda"
[19] "Year1/paptowl/paptowl.Rda"   "Year1/peanbutr/peanbutr.Rda" "Year1/photo/photo.Rda"      
[22] "Year1/razors/razors.Rda"     "Year1/saltsnck/saltsnck.Rda" "Year1/shamp/shamp.Rda"      
[25] "Year1/soup/soup.Rda"         "Year1/spagsauc/spagsauc.Rda" "Year1/sugarsub/sugarsub.Rda"
[28] "Year1/toitisu/toitisu.Rda"   "Year1/toothbr/toothbr.Rda"   "Year1/toothpa/toothpa.Rda"  
[31] "Year1/yogurt/yogurt.Rda"     "Year2/beer/beer.Rda"         "Year2/blades/blades.Rda"    
[34] "Year2/carbbev/carbbev.Rda"   "Year2/cigets/cigets.Rda"     "Year2/coffee/coffee.Rda"    
[37] "Year2/coldcer/coldcer.Rda"   "Year2/deod/deod.Rda"         "Year2/diapers/diapers.Rda"  
[40] "Year2/factiss/factiss.Rda"   "Year2/fzdinent/fzdinent.Rda" "Year2/fzpizza/fzpizza.Rda"  
[43] "Year2/hhclean/hhclean.Rda"   "Year2/hotdog/hotdog.Rda"     "Year2/laundet/laundet.Rda"  
[46] "Year2/margbutr/margbutr.Rda" "Year2/mayo/mayo.Rda"         "Year2/milk/milk.Rda"        
[49] "Year2/mustketc/mustketc.Rda" "Year2/paptowl/paptowl.Rda"   "Year2/peanbutr/peanbutr.Rda"
[52] "Year2/photo/photo.Rda"       "Year2/razors/razors.Rda"     "Year2/saltsnck/saltsnck.Rda"
[55] "Year2/shamp/shamp.Rda"       "Year2/soup/soup.Rda"         "Year2/spagsauc/spagsauc.Rda"
[58] "Year2/sugarsub/sugarsub.Rda" "Year2/toitisu/toitisu.Rda"   "Year2/toothbr/toothbr.Rda"  
[61] "Year2/toothpa/toothpa.Rda"   "Year2/yogurt/yogurt.Rda"

One solution is to parse the file names and assign them as names to elements in a list of data frames. We'll use some sample data that has monthly sales for beer brands across two years that were saved as CSV files into two subdirectories, year1 and year2 .

We will use lapply() to read the files into a list of data frames, and then use the names() function to name each element by appending year<x>. to the file name (excluding .csv ).

fileList <- c("year1/beer.csv","year2/beer.csv")

data <- lapply(fileList,function(x){
     read.csv(x)
})
# generate data set names to be assigned to elements in the list
fileNameTokens <- strsplit(fileList,"/|[.]")

theNames <- unlist(lapply(fileNameTokens,function(x){
     paste0(x[1],".",x[2])
}))
names(data) <- theNames
# print first six rows of file 1 based on named extract
data[["year1.beer"]][1:6,]

...and the output.

> data[["year1.beer"]][1:6,]
  Month      Item Sales
1     1 Budweiser 83047
2     2 Budweiser 38374
3     3 Budweiser 47287
4     4 Budweiser 18417
5     5 Budweiser 23981
6     6 Budweiser 55471
> 

Next, we'll print the first few rows of the second file.

> # print first six rows of file 1 based on named extract
> data[["year2.beer"]][1:6,]
  Month      Item Sales
1     1 Budweiser 23847
2     2 Budweiser 33847
3     3 Budweiser 44400
4     4 Budweiser 35333
5     5 Budweiser 18710
6     6 Budweiser 63108
> 

If one needs to access the files directly without relying on the list() names, they can be assigned to the parent environment within the lapply() function via the assign() function, as noted in the other answer.

# alternate form, assigning directly to parent environment

data <- lapply(fileList,function(x){
     # x is the filename, parse into strings to generate data set name
     fileNameTokens <- unlist(strsplit(x,"/|[.]"))
     assign(paste0(fileNameTokens[1],".",fileNameTokens[2]), read.csv(x),pos=1)
})
head(year1.beer)

...and the output.

> head(year1.beer)
  Month      Item Sales
1     1 Budweiser 83047
2     2 Budweiser 38374
3     3 Budweiser 47287
4     4 Budweiser 18417
5     5 Budweiser 23981
6     6 Budweiser 55471
> 

The technique also works with RDS files as follows.

data <- lapply(fileList,function(x){
     # x is the filename, parse into strings to generate data set name
     fileNameTokens <- unlist(strsplit(x,"/|[.]"))
     assign(paste0(fileNameTokens[1],".",fileNameTokens[2]), readRDS(x),pos=1)
})
head(year1.beer)

...and the output.

> head(year1.beer)
  Month      Item Sales
1     1 Budweiser 83047
2     2 Budweiser 38374
3     3 Budweiser 47287
4     4 Budweiser 18417
5     5 Budweiser 23981
6     6 Budweiser 55471
>

One option might be to load the files in a new environment and then assign them to a custom named object in the parent environment.

This is modified from https://stackoverflow.com/a/5577647/6561924

# first create custom names for objects (e.g. add folder names)
file_names <- gsub("/", "_", files)
file_names <- gsub("\\.Rda", "", file_names)

# function to load objects in new environ
load_obj <- function(f, f_name) {
  env <- new.env()
  nm <- load(f, env)[1]  # load into new environ and capture name
  assign(f_name, env[[nm]], pos = 1) # pos 1 is parent env
}

# load all
mapply(load_obj, files, file_names)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM