简体   繁体   English

将函数应用于数据框的块

[英]Apply function to chunks of a data frame

I'm a C# programmer who's been asked to do some work in R. I need to figure out how to call a function multiple times passing in 'chunks' of a data frame; 我是一名C#程序员,他被要求在R中做一些工作。我需要弄清楚如何多次调用函数来传递数据帧的“块”; for all rows where the first two columns are distinct I need to call the function once. 对于前两列不同的所有行,我需要调用该函数一次。

Here's what I mean: 这就是我的意思:

Stratum<-c("FPN", "FPN", "FPN", "MPN", "MPN", "MPN")
Cal<-c("ynnn", "ynnn", "yynn", "ynnn", "ynnn", "yynn")
Band.1<-c(1,2,1,1,2,1)
Band.2<-c(2,3,2,2,3,2)
Regroup<-c("No","Yes","No","Yes","No","No")
decs.data<-data.frame(Stratum,Cal,Band.1,Band.2,Regroup,stringsAsFactors=FALSE)

Stratum  Cal Band.1 Band.2 Regroup
    FPN ynnn      1      2      No
    FPN ynnn      2      3     Yes
    FPN yynn      1      2      No
    MPN ynnn      1      2     Yes
    MPN ynnn      2      3      No
    MPN yynn      1      2      No

For the above data I'd call the function four times - once passing it all the rows of decs.data where Stratum="FPN" and Cal="ynnn", then where Stratum="FPN" and Cal="yynn" and so on. 对于上面的数据,我将函数调用四次 - 一次传递decs.data的所有行,其中Stratum =“FPN”和Cal =“ynnn”,然后Stratum =“FPN”和Cal =“yynn”,等等。

The function won't operate on those rows, it uses them to determine which data file to load from disc and what to do with it. 该函数不会对这些行进行操作,它使用它们来确定从光盘加载哪个数据文件以及如何处理它。

How would I go about calling a function this way in R? 我如何在R中以这种方式调用函数? I'm sure 'apply' must be involved but I'm struggling to figure out how. 我确信'申请'必须参与,但我正在努力弄清楚如何。

UPDATE: I don't need all the rows in the data.frame as arguments to the function, just the matching ones (ie rows 1 & 2 for the 1st call, 3 for the 2nd, 4 & 5 for the 3rd and 6 for the 5th). 更新:我不需要data.frame中的所有行作为函数的参数,只需要匹配的行(即第一次调用的第1行和第2行,第2次调用3,第3次调用4和5,以及第5节)。

The function will load a data file based on the Stratum & Cal columns (eg FPN.ynnn.rdata) then decide how to process it based on the Band.1, Band.2 and Regroup columns. 该函数将根据Stratum&Cal列(例如FPN.ynnn.rdata)加载数据文件,然后根据Band.1,Band.2和Regroup列决定如何处理它。

Essentially, decs.data is not the data I want to manipulate but a decisions matrix defining which bands in which rdata files need to be regrouped. 从本质上讲,decs.data不是我想要操作的数据,而是一个决策矩阵,用于定义rdata文件需要重新分组的波段。

You are looking for by . 您正在寻找by If you want to run your function on subsets of the decs.data , using Stratum and Cal as the splitting variable, you can do: 如果要在decs.data子集上运行函数,使用StratumCal作为拆分变量,可以执行以下操作:

by(decs.data,decs.data[c('Stratum','Cal')],function)

where function is your function. function是你的功能。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM