简体   繁体   English

将r应用于csv文件的一列

[英]lapply r to one column of a csv file

I have a folder with several hundred csv files. 我有一个包含数百个csv文件的文件夹。 I want to use lappply to calculate the mean of one column within each csv file and save that value into a new csv file that would have two columns: Column 1 would be the name of the original file. 我想使用lappply计算每个csv文件中一列的平均值,然后将该值保存到一个新的csv文件中,该文件将包含两列:列1将是原始文件的名称。 Column 2 would be the mean value for the chosen field from the original file. 第2列将是原始文件中所选字段的平均值。 Here's what I have so far: 这是我到目前为止的内容:

setwd("C:/~~~~")
list.files()
filenames <- list.files()
read_csv <- lapply(filenames, read.csv, header = TRUE)
dataset <- lapply(filenames[1], mean)
write.csv(dataset, file = "Expected_Value.csv")

Which gives the error message: 给出错误信息:

Warning message: In mean.default("2pt.csv"[[1L]], ...) : argument is not numeric or logical: returning NA 警告消息:在mean.default(“ 2pt.csv” [[1L]],...)中:参数不是数字或逻辑:返回NA

So I think I have 2(at least) problems that I cannot figure out. 所以我认为我有2个(至少)无法解决的问题。

First, why doesn't r recognize that column 1 is numeric? 首先,为什么r不能识别第1列是数字? I double, triple checked the csv files and I'm sure this column is numeric. 我对cv文件进行了两次,三次检查,并且我确定此列是数字。

Second, how do I get the output file to return two columns the way I described above? 其次,如何获取输出文件以上述方式返回两列? I haven't gotten far with the second part yet. 对于第二部分,我还没有走太远。

I wanted to get the first part to work first. 我想让第一部分首先工作。 Any help is appreciated. 任何帮助表示赞赏。

I didn't use lapply but have done something similar. 我没有用lapply,但做了类似的事情。 Hope this helps! 希望这可以帮助!

    i= 1:2 ##modify as per need

    ##create empty dataframe
    df <- NULL 

    ##list directory from where all files are to be read
    directory <- ("C:/mydir/")

    ##read all file names from directory
    x <- as.character(list.files(directory,,pattern='csv'))
    xpath <- paste(directory, x, sep="")

    ##For loop to read each file and save metric and file name 
    for(i in i) 
    {
    file <- read.csv(xpath[i], header=T, sep=",")
    first_col <- file[,1]
    d<-NULL
   d$mean <- mean(first_col)
   d$filename=x[i]
   df <- rbind(df,d)
    }

   ###write all output to csv
   write.csv(df, file = "C:/mydir/final.csv")

   CSV file looks like below 

    mean        filename
   1999.000661  hist_03082015.csv
   1999.035121  hist_03092015.csv

Thanks for the two answers. 感谢您的两个答案。 After much review, it turns out that there was a much easier way to accomplish my goal. 经过大量的审查,事实证明,有一种更轻松的方法可以实现我的目标。 The csv files that I had were originally in one file. 我原来的csv文件原本在一个文件中。 I split them into multiple files by location. 我将它们按位置分成多个文件。 At the time, I thought this was necessary to calculate mean on each type. 当时,我认为这对于计算每种类型的mean是必要的。 Clearly, that was a mistake. 显然,这是一个错误。 I went to the original file and used aggregate . 我去了原始文件,并使用aggregate Code: 码:

setwd("C:/~~")
allshots <- read.csv("All_Shots.csv", header=TRUE)
EV <- aggregate(allshots$points, list(Location = allshots$Loc), mean)
write.csv(EV, file= "EV_location.csv")

This was a simple solution. 这是一个简单的解决方案。 Thanks again or the answers. 再次感谢或答复。 I'll need to get better at lapply for future projects so they were not a waste of time. 我需要得到更好的在lapply所以他们不会浪费时间为将来的项目。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM