[英]How to process multiple csv files in R
I have many csv files in three separate folders as follows: 我在三个单独的文件夹中有许多csv文件,如下所示:
folder1
a1_0023.csv
a2_0034.csv
a3_6163.csv
...
(100 files)
folder2
b1_0023.csv
b2_0034.csv
b3_6163.csv
...
(100 files)
folder3
c1_0023.csv
c2_0034.csv
c3_6163.csv
...
(100 files)
And I have a text file that lists the last four digits: 我有一个列出最后四位数字的文本文件:
theLastFourDigits.txt
0023
0034
6163
...
(100 lines)
For the 0023
files, I do a simple job in R: 对于
0023
文件,我在R中做了一个简单的工作:
a <- read.table("D:/folder1/a1_0023.csv", header=FALSE, sep=",")
a <- as.matrix(a)
b <- read.table("D:/folder2/b1_0023.csv", header=FALSE, sep=",")
b <- as.matrix(b)
c <- read.table("D:/folder3/c1_0023.csv", header=FALSE, sep=",")
c <- as.matrix(c)
# Initiate the column vector that contains the results
myanswer <- matrix(0, nrow=100, ncol=1)
# Do a simple job, and store the result in myanswer column
myanswer[1] = nrow(a)*nrow(b)/nrow(c)
I have two questions here: (1) How can we iterate this process for the whole 100 digits? 我在这里有两个问题:(1)如何对整个100位数字重复此过程? (2) How can we do the multiple jobs if I don't have the
theLastFourDigits.txt
list file? (2)如果没有
theLastFourDigits.txt
列表文件,我们该怎么做多个作业?
EDIT: 编辑:
I tried something like the following: 我尝试了以下内容:
setwd("D:/folder1/")
filelist1 <- Sys.glob("*.csv")
setwd("D:/folder2/")
filelist2 <- Sys.glob("*.csv")
setwd("D:/folder3/")
filelist3 <- Sys.glob("*.csv")
for (i in 1:100) {
setwd("D:/folder1/")
a <- read.csv(filelist1[i], header=FALSE, sep=",")
a <- as.matrix(a)
setwd("D:/folder2/")
b <- read.csv(filelist2[i], header=FALSE, sep=",")
b <- as.matrix(b)
setwd("D:/folder3/")
c <- read.csv(filelist3[i], header=FALSE, sep=",")
c <- as.matrix(c)
nrow(a)*nrow(b)/nrow(c)
}
And the error message is like: 错误消息是这样的:
Error in read.table(file = file, header = header, sep = sep, quote = quote, :
no lines available in input
3 stop("no lines available in input")
2 read.table(file = file, header = header, sep = sep, quote = quote,
dec = dec, fill = fill, comment.char = comment.char, ...)
1 read.csv(filelist1[i], header = FALSE, sep = ",")
What am I missing here? 我在这里想念什么?
For question (2), you might find this function useful. 对于问题(2),您可能会发现此功能很有用。 I have used it in the past to read in all the csv files in a given folder (Windows 7).
我过去曾用它来读取给定文件夹(Windows 7)中的所有csv文件。 You would need to modify the read.csv() arguments as needed for your application.
您需要根据应用程序的需要修改read.csv()参数。 Once all the data from a folder has been read in, you can convert all the data frames to matrices with lapply().
读取完文件夹中的所有数据后,您可以使用lapply()将所有数据帧转换为矩阵。
list.csv <- function(mydir, add.source=TRUE) {
# combine all csv files in a given directory into a single list
filenames <- list.files(mydir)[grep(".csv$", list.files(mydir))]
nfiles <- length(filenames)
# create an empty list where all the files will be stored
files.list <- vector(mode="list", length=nfiles)
for(i in 1:nfiles) {
# read the data into a temporary file
temp <- read.csv(paste(mydir, filenames[i], sep=""), as.is=TRUE)
# add a new column identifying the source file
if(add.source) temp$source <- filenames[i]
# put the data into the list
files.list[[i]] <- temp
}
files.list
}
mylist <- list.csv("C:/temp/")
# look at headers from all the data frames
lapply(mylist, head)
# convert all the data frames to matrices
mylistm <- lapply(mylist, as.matrix)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.