简体   繁体   English

将自定义函数应用于r中数据框中的每一行

[英]Applying custom functions to each row in a dataframe in r

I am trying to using each row in a data frame as inputs to a function to process some data and then write the output to a csv file. 我试图使用数据框中的每一行作为函数的输入来处理一些数据,然后将输出写入csv文件。 As per the following example 根据以下示例

myfunction <- function(X, Y, Z){

                       data <- read.csv("mydata.csv")
                       subsetedData <- subset(data, x=X & y=Y & z=Z, select=x:z)
                       write.csv(subsetedData, file="mycsvfile.csv")
                       }

apply(myXYZdata, MARGIN = 1, function(x1, x2, x3) myfunction(X, Y, Z))

I want to subset based on every row in the dataframe myXYZdata. 我想基于数据帧myXYZdata中的每一行进行子集化。 However this does not appear to work or I am not fully understanding the correct usage of apply. 但是,这似乎不起作用,或者我不完全理解apply的正确用法。

I know this can be done using a loop but would prefer not to do it that way. 我知道这可以使用循环完成,但不希望这样做。

Edit: 编辑:

The purpose of this is that I have a large data file which I want to subset based on combinations of variables found in my data frame "myXYZdata" and store the results in new data files. 这样做的目的是我有一个大型数据文件,我想根据我的数据框“myXYZdata”中找到的变量组合进行子集化,并将结果存储在新的数据文件中。

The large data file I want to subset is in the format. 我想要子集的大数据文件是格式。

date                      x   y  z    count          
1 2015-08-20 00:00:00.000 a   d  h    56
2 2015-08-26 00:00:00.000 b   e  h     4
3 2015-08-18 00:00:00.000 b   f  i     8
4 2015-09-03 00:00:00.000 c   e  l     32
5 2015-08-12 00:00:00.000 a   g  l     3

I believe its easier to pass a row as argument to your function. 我相信它更容易将一行作为参数传递给你的函数。

myfunction <- function(row){

                   data <- read.csv("mydata.csv")
                   subsetedData <- subset(data, x=row[1] & y=row[2] & z=row[3], select=x:z)
                   write.csv(subsetedData, file="mycsvfile.csv")
                   }

apply(myXYZdata[,c("X","Y","Z")], MARGIN = 1, myfunction)

What about using mapply (multi-variable apply): 如何使用mapply (多变量应用):

mapply(myfunction, myXYZdata$X, myXYZdata$Y, myXYZdata$Z, fnms)

You will need to create a vector of file names ( fnms ) so that each entry is written to a different file and then change myfunction so that it takes an argument for the file name. 您需要创建一个文件名( fnms )向量,以便将每个条目写入另一个文件,然后更改myfunction以便它接受文件名的参数。

Alternatively put append = TRUE as an argument to write.csv in myfunction to get it all written to the same file (but be aware that successive runs of the code will not overwrite the file - you could precede the write.csv(..., append = TRUE) with if(file.exists("mycsvfile.csv")) file.remove("mycsvfile.csv") ). 或者将append = TRUE作为参数写入write.csv中的myfunction以使其全部写入同一文件(但请注意,连续运行的代码不会覆盖文件 - 您可以在write.csv(..., append = TRUE)之前write.csv(..., append = TRUE) with if(file.exists("mycsvfile.csv")) file.remove("mycsvfile.csv") )。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM