简体   繁体   English

如何在R中使用Reduce读取多个csv文件

[英]How to read multiple csv files with Reduce in R

I'm trying to extract one column from multiple .csv files using Reduce. 我正在尝试使用Reduce从多个.csv文件中提取一列。 What I have is 我有的是

a vector with the path to every single .csv 一个带有指向每个.csv路径的向量

filepaths

a function to read a .csv and return one of it's columns 读取.csv并返回其中一个列的函数

getData <- function(path,column) {
   d = read.csv(path)
   d[,column]
}

and the Reduce function, to apply the getData function to every single filepath and store the results in a single collection (for demonstration I only take the first three path strings) 和Reduce函数,将getData函数应用于每个单个文件路径,并将结果存储在单个集合中(为演示起见,我仅采用前三个路径字符串)

Reduce(function(path,acc) append(acc, getData(path,column)), filepaths[1:3],c())

If I do this, I get the following error, which occurs, when read.csv is called with one of the filepaths 如果这样做,当使用其中一个文件路径调用read.csv时,会发生以下错误。

Error in read.table(file = file, header = header, sep = sep, quote = quote, : 'file' must be a character string or connection read.table中的错误(文件=文件,标头=标头,sep = sep,引用=引号,:“文件”必须是字符串或连接

This is strange, cause if I call the "getData" function manually like 这很奇怪,因为如果我像这样手动调用“ getData”函数

getData(filepaths[1],col)
getData(filepaths[2],col)
getData(filepaths[3],col)

it works. 有用。

I know, I could do this with a for loop. 我知道,我可以使用for循环来做到这一点。 But I want to understand, what the problem is. 但我想了解问题所在。

You could use fread from data.table to read in only the desired column, instead of reading in entire csv's and consequently dropping all columns but one, as in your function. 您可以使用data.table freaddata.table读取所需的列,而不是读取整个csv的列,从而像功能中那样删除除一列以外的所有列。

library(data.table)
unlist(lapply(filepaths, fread, select= "colname")) #output is a vector

I just figured it out. 我只是想通了。 The problem is, that Reduce expects a function, that has the accumulator as FIRST parameter, and the element as second. 问题是,Reduce期望一个函数,该函数具有累加器作为FIRST参数,而元素作为第二个。 I switched them. 我换了。 So the solution looks like this: 因此,解决方案如下所示:

getData <- function(path,column) {
  d = read.csv(path)
  d[,column]
}

Reduce(function(acc,path) append(acc, getData(path,column)), filepaths[1:3],c())

Thanks for the hint with fread . 感谢您对fread的提示。 I see that this is much better than read.csv 我看到这比read.csv

Reduce() is used with functions that process data and return same kind of data. Reduce()与处理数据并返回相同类型数据的函数一起使用。 For example reduceFun(x1,x2) which compares x1 and x2 and returns the max will be called first with x1 and x2 being the 2 first elements of the vector, then the result will be passd as x1 and the be third element as x2: 例如reduceFun(x1,x2)比较x1和x2并返回最大值,将首先调用x1和x2作为向量的前两个元素,然后将结果作为x1传递,将第三个元素作为x2传递:

reduceFun <- function(x1,x2) 
{
  print(paste("x1=",x1, " : x2=",x2, " : max=",max(x1,x2)));
  return(max(x1,x2))
}
> res <- Reduce(reduceFun, 1:10)
[1] "x1= 1  : x2= 2  : max= 2"
[1] "x1= 2  : x2= 3  : max= 3"
[1] "x1= 3  : x2= 4  : max= 4"
[1] "x1= 4  : x2= 5  : max= 5"
[1] "x1= 5  : x2= 6  : max= 6"
[1] "x1= 6  : x2= 7  : max= 7"
[1] "x1= 7  : x2= 8  : max= 8"
[1] "x1= 8  : x2= 9  : max= 9"
[1] "x1= 9  : x2= 10  : max= 10"
> res
[1] 10

So Reduce() is probably not what you want to use, there are many other ways as shown in other answers. 所以Reduce()可能不是您想要使用的,还有许多其他方法,如其他答案所示。

This works for me! 这对我有用!

library(data.table)
setwd("C:/Users/your_path_here/CSV Files/")

WD="C:/Users/your_path_here/CSV Files/"
data<-data.table(read.csv(text="CashFlow,Cusip,Period"))

csv.list<- list.files(WD)
k=1

for (i in csv.list){
  temp.data<-read.csv(i)
  data<-data.table(rbind(data,temp.data))

  if (k %% 100 == 0)
    print(k/length(csv.list))

  k<-k+1
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM