简体   繁体   English

在 R 中循环以读取许多文件

[英]Loop in R to read many files

I have been wondering if anybody knows a way to create a loop that loads files/databases in R. Say i have some files like that: data1.csv, data2.csv,..., data100.csv.我一直想知道是否有人知道在 R 中创建一个循环加载文件/数据库的方法。假设我有一些这样的文件:data1.csv、data2.csv、...、data100.csv。

In some programming languages you one can do something like this data +{ x }+ .csv the system recognizes it like datax.csv, and then you can apply the loop.在某些编程语言中,您可以执行类似 data +{ x }+ .csv 的操作,系统会将其识别为 datax.csv,然后您可以应用循环。

Any ideas?有任何想法吗?

Sys.glob() is another possibility - it's sole purpose is globbing or wildcard expansion. Sys.glob()是另一种可能性 - 它的唯一目的是通配符或通配符扩展。

dataFiles <- lapply(Sys.glob("data*.csv"), read.csv)

That will read all the files of the form data[x].csv into list dataFiles , where [x] is nothing or anything.这会将所有格式为data[x].csv的文件读取到列表dataFiles ,其中[x]什么也不是。

[Note this is a different pattern to that in @Joshua's Answer. [请注意,这与@Joshua 的回答中的模式不同。 There, list.files() takes a regular expression, whereas Sys.glob() just uses standard wildcards;在那里, list.files()采用正则表达式,而Sys.glob()仅使用标准通配符; which wildcards can be used is system dependent, details can be used can be found on the help page ?Sys.glob .]可以使用哪些通配符取决于系统,可以在帮助页面?Sys.glob上找到可以使用的详细信息。]

See ?list.files .?list.files

myFiles <- list.files(pattern="data.*csv")

Then you can loop over myFiles .然后你可以遍历myFiles

I would put all the CSV files in a directory, create a list and do a loop to read all the csv files from the directory in the list.我会将所有 CSV 文件放在一个目录中,创建一个列表并执行循环以从列表中的目录中读取所有 csv 文件。

setwd("~/Documents/")
ldf <- list() # creates a list
listcsv <- dir(pattern = "*.csv") # creates the list of all the csv files in the directory
for (k in 1:length(listcsv)){
 ldf[[k]] <- read.csv(listcsv[k])
}
str(ldf[[1]]) 

Read the headers in a file so that we can use them for replacing in merged file读取文件中的标题,以便我们可以使用它们在合并文件中进行替换

library(dplyr)
library(readr)

list_file <- list.files(pattern = "*.csv") %>% 
  lapply(read.csv, stringsAsFactors=F) %>% 
   bind_rows 
fi<-list.files(directory_path,full.names=T)
dat<-lapply(fi,read.csv)

dat will contain the datasets in a list dat 将包含列表中的数据集

Let's assume that your files have the file format that you mentioned in your question and that they are located in the working directory.假设您的文件具有您在问题中提到的文件格式,并且它们位于工作目录中。

You can vectorise creation of the file names if they have a simple naming structure.如果文件名具有简单的命名结构,您可以矢量化创建文件名。 Then apply a loading function on all the files (here I used purrr package, but you can also use lapply )然后对所有文件应用加载函数(这里我使用了purrr包,但你也可以使用lapply

library(purrr)
c(1:100) %>% paste0("data", ., ".csv") %>% map(read.csv)

Here's another solution using a for loop.这是使用 for 循环的另一个解决方案。 I like it better than the others because of its flexibility and because all dfs are directly stored in the global environment.我比其他人更喜欢它,因为它的灵活性以及所有 dfs 都直接存储在全局环境中。

Assume you've already set your working directory, the algorithm will iteratively read all files and store them in the global environment with the name "data i ".假设您已经设置了工作目录,算法将迭代读取所有文件并将它们存储在名为“data i ”的全局环境中。

list<-c(1:100)
for (i in list) {
  filename<-paste0("data",i)
  wd<-paste0("data",i,".csv")
  assign(filename,read.csv(wd))
}

setwd("C:/yourpath") setwd("C:/你的路径")

temp <- list.files(pattern = "*.csv") temp <- list.files(pattern = "*.csv")

allData <- do.call("rbind",lapply(Sys.glob(temp), read.csv)) allData <- do.call("rbind",lapply(Sys.glob(temp), read.csv))

This may be helpful if you have datasets for participants as in psychology/sports/medicine etc.如果您有心理学/体育/医学等参与者的数据集,这可能会有所帮助。

setwd("C:/yourpath")

temp <- list.files(pattern = "*.sav")

#Maybe you want to unselect /delete IDs
DEL <- grep('ID(04|08|11|13|19).sav', temp)
temp2 <- temp[-DEL]

#Make a list of that contains all data
read.all <- lapply(temp2, read_sav)
#View(read.all[1])

#Option 1: put one under the next
df <- do.call("rbind", read.all)

Option 2: make something within each dataset (single IDs) e.g. get the mean of certain parts of each participant

mw_extraktion <- function(data_raw){
  data_raw <- data.frame(data_raw)
  #you may now calculate e.g. the mean for a certain variable for each ID
  ID <- data_raw$ID[1]
  data_OneID <- c(ID, Var2, Var3) #put your new variables (e.g. Means) here
} #end of function   
data_combined <- t(data.frame(sapply(read.all, mw_extraktion) ) )

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM