简体   繁体   English

将多个文件同时加载到R中(具有相似的文件名)

[英]Loading multiple files into R at the same time (with similar file names)

I am trying to load in multiple files into an R environment, I have tried something like the following; 我正在尝试将多个文件加载到R环境中,我尝试了类似以下的内容:

files <- list.files(pattern = ".Rda", recursive = TRUE)

lapply(files,load,.GlobalEnv)

Which only loads in one data file (incorrectly). 仅加载到一个数据文件中(不正确)。 The problem I am finding is that all the files have the same names across each years. 我发现的问题是,每年所有文件的名称都相同。 For example "Year1/beer/beer.Rda" has also "Year2/beer/beer.Rda" . 例如, "Year1/beer/beer.Rda"也具有"Year2/beer/beer.Rda"

I am trying to rename the data files upon import so beer1 and beer2 will correspond to beer year 1 and beer year 2 etc. 我正在尝试在导入时重命名数据文件,因此beer1beer2将分别对应于啤酒年1和啤酒年2等。

Anybody have a better method of loading in the data? 有人有更好的数据加载方法吗? I have more than 2 years worth of data. 我拥有2年以上的数据。

File names: 档案名称:

 [1] "Year1/beer/beer.Rda"         "Year1/blades/blades.Rda"     "Year1/carbbev/carbbev.Rda"  
 [4] "Year1/cigets/cigets.Rda"     "Year1/coffee/coffee.Rda"     "Year1/coldcer/coldcer.Rda"  
 [7] "Year1/deod/deod.Rda"         "Year1/diapers/diapers.Rda"   "Year1/factiss/factiss.Rda"  
[10] "Year1/fzdinent/fzdinent.Rda" "Year1/fzpizza/fzpizza.Rda"   "Year1/hhclean/hhclean.Rda"  
[13] "Year1/hotdog/hotdog.Rda"     "Year1/laundet/laundet.Rda"   "Year1/margbutr/margbutr.Rda"
[16] "Year1/mayo/mayo.Rda"         "Year1/milk/milk.Rda"         "Year1/mustketc/mustketc.Rda"
[19] "Year1/paptowl/paptowl.Rda"   "Year1/peanbutr/peanbutr.Rda" "Year1/photo/photo.Rda"      
[22] "Year1/razors/razors.Rda"     "Year1/saltsnck/saltsnck.Rda" "Year1/shamp/shamp.Rda"      
[25] "Year1/soup/soup.Rda"         "Year1/spagsauc/spagsauc.Rda" "Year1/sugarsub/sugarsub.Rda"
[28] "Year1/toitisu/toitisu.Rda"   "Year1/toothbr/toothbr.Rda"   "Year1/toothpa/toothpa.Rda"  
[31] "Year1/yogurt/yogurt.Rda"     "Year2/beer/beer.Rda"         "Year2/blades/blades.Rda"    
[34] "Year2/carbbev/carbbev.Rda"   "Year2/cigets/cigets.Rda"     "Year2/coffee/coffee.Rda"    
[37] "Year2/coldcer/coldcer.Rda"   "Year2/deod/deod.Rda"         "Year2/diapers/diapers.Rda"  
[40] "Year2/factiss/factiss.Rda"   "Year2/fzdinent/fzdinent.Rda" "Year2/fzpizza/fzpizza.Rda"  
[43] "Year2/hhclean/hhclean.Rda"   "Year2/hotdog/hotdog.Rda"     "Year2/laundet/laundet.Rda"  
[46] "Year2/margbutr/margbutr.Rda" "Year2/mayo/mayo.Rda"         "Year2/milk/milk.Rda"        
[49] "Year2/mustketc/mustketc.Rda" "Year2/paptowl/paptowl.Rda"   "Year2/peanbutr/peanbutr.Rda"
[52] "Year2/photo/photo.Rda"       "Year2/razors/razors.Rda"     "Year2/saltsnck/saltsnck.Rda"
[55] "Year2/shamp/shamp.Rda"       "Year2/soup/soup.Rda"         "Year2/spagsauc/spagsauc.Rda"
[58] "Year2/sugarsub/sugarsub.Rda" "Year2/toitisu/toitisu.Rda"   "Year2/toothbr/toothbr.Rda"  
[61] "Year2/toothpa/toothpa.Rda"   "Year2/yogurt/yogurt.Rda"

One solution is to parse the file names and assign them as names to elements in a list of data frames. 一种解决方案是解析文件名,并将其作为名称分配给数据帧列表中的元素。 We'll use some sample data that has monthly sales for beer brands across two years that were saved as CSV files into two subdirectories, year1 and year2 . 我们将使用一些样本数据,这些数据具有两年啤酒品牌的月销售量,这些数据以CSV文件的形式保存在year1year2两个子目录中。

We will use lapply() to read the files into a list of data frames, and then use the names() function to name each element by appending year<x>. 我们将使用lapply()将文件读取到数据帧列表中,然后使用names()函数通过追加year<x>.来命名每个元素year<x>. to the file name (excluding .csv ). 文件名(不包括.csv )。

fileList <- c("year1/beer.csv","year2/beer.csv")

data <- lapply(fileList,function(x){
     read.csv(x)
})
# generate data set names to be assigned to elements in the list
fileNameTokens <- strsplit(fileList,"/|[.]")

theNames <- unlist(lapply(fileNameTokens,function(x){
     paste0(x[1],".",x[2])
}))
names(data) <- theNames
# print first six rows of file 1 based on named extract
data[["year1.beer"]][1:6,]

...and the output. ...和输出。

> data[["year1.beer"]][1:6,]
  Month      Item Sales
1     1 Budweiser 83047
2     2 Budweiser 38374
3     3 Budweiser 47287
4     4 Budweiser 18417
5     5 Budweiser 23981
6     6 Budweiser 55471
> 

Next, we'll print the first few rows of the second file. 接下来,我们将打印第二个文件的前几行。

> # print first six rows of file 1 based on named extract
> data[["year2.beer"]][1:6,]
  Month      Item Sales
1     1 Budweiser 23847
2     2 Budweiser 33847
3     3 Budweiser 44400
4     4 Budweiser 35333
5     5 Budweiser 18710
6     6 Budweiser 63108
> 

If one needs to access the files directly without relying on the list() names, they can be assigned to the parent environment within the lapply() function via the assign() function, as noted in the other answer. 如果一个人需要直接访问文件而不依赖于list()名称,则可以通过assign()函数assign()它们分配给lapply()函数中的父环境,如另一个答案所述。

# alternate form, assigning directly to parent environment

data <- lapply(fileList,function(x){
     # x is the filename, parse into strings to generate data set name
     fileNameTokens <- unlist(strsplit(x,"/|[.]"))
     assign(paste0(fileNameTokens[1],".",fileNameTokens[2]), read.csv(x),pos=1)
})
head(year1.beer)

...and the output. ...和输出。

> head(year1.beer)
  Month      Item Sales
1     1 Budweiser 83047
2     2 Budweiser 38374
3     3 Budweiser 47287
4     4 Budweiser 18417
5     5 Budweiser 23981
6     6 Budweiser 55471
> 

The technique also works with RDS files as follows. 该技术还可以如下处理RDS文件。

data <- lapply(fileList,function(x){
     # x is the filename, parse into strings to generate data set name
     fileNameTokens <- unlist(strsplit(x,"/|[.]"))
     assign(paste0(fileNameTokens[1],".",fileNameTokens[2]), readRDS(x),pos=1)
})
head(year1.beer)

...and the output. ...和输出。

> head(year1.beer)
  Month      Item Sales
1     1 Budweiser 83047
2     2 Budweiser 38374
3     3 Budweiser 47287
4     4 Budweiser 18417
5     5 Budweiser 23981
6     6 Budweiser 55471
>

One option might be to load the files in a new environment and then assign them to a custom named object in the parent environment. 一种选择是将文件加载到新环境中,然后将其分配给父环境中的自定义命名对象。

This is modified from https://stackoverflow.com/a/5577647/6561924 这是从https://stackoverflow.com/a/5577647/6561924修改而来

# first create custom names for objects (e.g. add folder names)
file_names <- gsub("/", "_", files)
file_names <- gsub("\\.Rda", "", file_names)

# function to load objects in new environ
load_obj <- function(f, f_name) {
  env <- new.env()
  nm <- load(f, env)[1]  # load into new environ and capture name
  assign(f_name, env[[nm]], pos = 1) # pos 1 is parent env
}

# load all
mapply(load_obj, files, file_names)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM