简体   繁体   English

从R中的多个.csv文件读取,处理和导出分析结果

[英]Read, process and export analysis results from multiple .csv files in R

I have a bunch of CSV files and I would like to perform the same analysis (in R) on the data within each file. 我有一堆CSV文件,我想对每个文件中的数据执行相同的分析(在R中)。 Firstly, I assume each file must be read into R (as opposed to running a function on the CSV and providing output, like a sed script). 首先,我假设每个文件都必须读入R中(这与在CSV上运行函数并提供输出(如sed脚本)相反)。

What is the best way to input numerous CSV files to R, in order to perform the analysis and then output separate results for each input? 为了执行分析然后为每个输入输出单独的结果,向R输入大量CSV文件的最佳方法是什么?

Thanks (btw I'm a complete R newbie) 谢谢(顺便说一句,我是一个完整的R新手)

You could go for Sean's option, but it's going to lead to several problems: 可以选择Sean的选择,但这会导致几个问题:

  1. You'll end up with a lot of unrelated objects in the environment, with the same name as the file they belong to. 您最终将在环境中得到许多不相关的对象,这些对象与其所属文件的名称相同。 This is a problem because... 这是一个问题,因为...
  2. For loops can be pretty slow, and because you've got this big pile of unrelated objects, you're going to have to rely on for loops over the filenames for each subsequent piece of analysis - otherwise, how the heck are you going to remember what the objects are named so that you can call them? for循环可能非常慢,并且由于您拥有大量不相关的对象,因此在后续的每个分析中都将不得不依靠文件名的for循环-否则,您将如何处理还记得对象的名字,以便您可以调用它们吗?
  3. Calling objects by pasting their names in as strings - which you'll have to do, because, again, your only record of what the object is called is in this list of strings - is a real pain. 通过将对象的名称粘贴为字符串来调用对象-这是您必须要做的,因为同样,在该字符串列表中,关于对象被调用的唯一记录是一个真正的难题。 Have you ever tried to call an object when you can't write its name in the code? 当您无法在代码中写入对象名称时,您是否曾经尝试过调用它? I have, and it's horrifying. 我有,这太恐怖了。

A better way of doing it might be with lapply() . 更好的方法可能是使用lapply()

# List files
filelist <- list.files(pattern = "*.csv")

# Now we use lapply to perform a set of operations 
#   on each entry in the list of filenames.
to_dispose_of <- lapply(filelist, function(x) {

    # Read in the file specified by 'x' - an entry in filelist
    data.df <- read.csv(x, skip = 1, header = TRUE)

    # Store the filename, minus .csv. This will be important later.
    filename <- substr(x = x, start = 1, stop = (nchar(x)-4))

    # Your analysis work goes here. You only have to write it out once 
    #   to perform it on each individual file.
    ...

    # Eventually you'll end up with a data frame or a vector of analysis 
    #   to write out. Great! Since you've kept the value of x around, 
    #   you can do that trivially
    write.table(x = data_to_output, 
                file = paste0(filename, "_analysis.csv"), 
                sep = ",")
})

And done. 并做了。

You can try the following codes by putting all csv files in the same directory. 您可以通过将所有csv文件放在同一目录中来尝试以下代码。

names = list.files(pattern="*.csv")   %csv file names
for(i in 1:length(names)){ assign(names[i],read.csv(names[i],skip=1, header=TRUE))}

Hope this helps ! 希望这可以帮助 !

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM