简体   繁体   English

如何将函数应用于特定目录中的一组.csv文件?

[英]How can I apply a function to a set of .csv files in a particular directory?

Despite lots of research and several efforts using lapply (I think/hope that's the correct apply function), I have been unable to achieve the following and would like some guidance. 尽管进行了大量研究并使用lapply了一些努力(我认为/希望这是正确的apply函数),但我无法实现以下目标,并且需要一些指导。 What I want to do is read in all files in a single directory, merge them all into a single dataframe, making sure that each file has the first seven rows deleted before the merge. 我想做的是读取单个目录中的所有文件,将它们全部合并到一个数据帧中,确保在合并之前每个文件都删除了前七行。

(Note that all files contain the same column headings and contain the same datatypes.) (请注意,所有文件都包含相同的列标题和相同的数据类型。)

I have tried this, but it clearly falls short of everything I want to achieve: 我已经尝试过了,但是显然没有达到我想要实现的所有目标:

files <- list.files(pattern = "*.csv") # Gather a list of everything in the directory that is a .csv file.
aconex <- lapply(files, fread) # Use lapply (I think this is correct) to apply the fread() function (from the data.table package) to each .csv file

This results in everything being stored in a vector, whereas I want the output to be a data frame. 这导致所有内容都存储在向量中,而我希望输出是数据帧。

There has to be a better approach - I just can't seem to figure it out. 必须有一个更好的方法-我似乎无法弄清楚。

Can anybody suggest a better solution? 有人可以提出更好的解决方案吗?

UPDATE: 更新:

Alternatively, I have written a for loop which partially achieves what I want; 另外,我编写了一个for循环,部分实现了我想要的功能。 the problem is that it only saves a single file's worth of data to the data frame (there are 15 files in total): 问题在于,它仅将单个文件的数据值保存到数据框中(总共有15个文件):

for(x in list.files(pattern = "*.csv")){
  df <- data.table::fread(x)
  df <- df[-(1:7), ]
  colnames(df) <- as.character(unlist(df[1,]))
  df <- df[-(1), ]
}

Once the first seven rows have been removed, I then apply the first row as column names and then remove the first row. 删除前七行后,我将第一行用作列名,然后删除第一行。 Again, what is a better way to achieve this? 同样,有什么更好的方法来实现这一目标?

Ideally, I want the resulting output to either be x-number of data frames ( df1 , df2 , .., dfX ) and I can then merge those, but, again, there has to be a better way - what is it? 理想情况下,我希望结果输出为x个数据帧( df1df2 ,.., dfX ),然后可以将它们合并,但是必须再次有一种更好的方法-它是什么?

Put simply, I want each file to be read into its own data frame, then for the value of row 8 to be used as the column headings, then the first eight rows removed (I only kept the eighth row in order to use it for the column headings before removing it). 简单地说,我想要读取每个文件到它自己的数据帧,则对于的值row 8被用作列标题,然后前八行除去(I只保留的第八行,以便将其用于删除列标题之前)。

This can be done by creating an anonymous function that does the reading with read.csv and then removes the first seven rows with the skip argument. 这可以通过创建一个匿名函数来完成,该函数使用read.csv进行读取,然后使用skip参数删除前七个行。 Then you can stick all the data.frame s together with do.call . 然后,您可以将所有data.framedo.call data.frame在一起。

files <- list.files(pattern = "*.csv")

#create f, which is a list of data frames
f <- lapply(files, function(m) df <- read.csv(m, skip = 7, header = TRUE))

#stick them all together with do.call-rbind
f_combine <- do.call("rbind", f)

If you do need the speed provided by data.table::fread , you could modify the code as follows: 如果确实需要data.table::fread提供的速度,则可以如下修改代码:

#create f, which is a list of data frames; modified with fread from data.table
f <- lapply(files, function(m) df <- fread(m, skip = 7))

#use rbindlist this time
f_combine <- rbindlist(f )

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM