[英]User-Defined function with data table aggregation
I'm attempting to write a function majorly with aggregations/merge/subset data sets. 我正在尝试主要使用聚合/合并/子集数据集编写函数。 My data frame looks like: 我的数据框如下所示:
NameA NameB NameC Score1 Score2
A F K 3 3
B F L 5 5
C F M 7 4
D G N 2 2
E G O 5 8
and the function I will run is: 我将运行的功能是:
test <- Fun(data, Score1, NameB)
First, I want to calculate the mean of Score1, grouped by NameB: 首先,我要计算按NameB分组的Score1的平均值:
Fun <- function(df, col, group_by){
setDT(df)
df1<- df[, sapply(.SD, mean), .SDcols = col, by= group_by]
}
After some extra coding, My data frame turns to be: 经过一些额外的编码后,我的数据帧变为:
NameA NameB NameC Score1 Score2 Group_Mean
A F K 3 3 4
B F L 5 5 4
C F M 4 4 4
D G N 2 2 5
E G O 5 8 5
Then, I want subset my data frame with Score1 != Score2. 然后,我想用Score1!= Score2子集我的数据框。 So I write: 所以我写:
Fun <- function(df, col, group_by){
setDT(df)
df1<- df[, sapply(.SD, mean), .SDcols = col, by= group_by]
df2 <- df1[which(df1[col] != df[Score2])]
}
but this gives me an error message as : Error in Ops.data.frame(df2[col], df[Score2]) : '==' only defined for equally-sized data frames
但这给了我一条错误消息,如: Error in Ops.data.frame(df2[col], df[Score2]) : '==' only defined for equally-sized data frames
After this step, I want to do some more math and subset as below: 完成此步骤后,我想做一些更多的数学运算和子集,如下所示:
Fun <- function(df, col, group_by){
setDT(df)
df1<- df[, sapply(.SD, mean), .SDcols = col, by= group_by]
df2 <- df1[which(df1[col] != df[Score2])]
df2["NewCol"] <- abs(df2[col] - df2[Score2])
output <- df2[which(df2[NewCol] > 1 or df2[NewCol] < 1.5)]
return(output)
}
I'm new to R and R user-defined function. 我是R和R用户定义函数的新手。 After the error message part, I'm stuck for a long time. 在错误消息部分之后,我被困了很长时间。 Please, if anyone is able to give me any suggestions on my code above, I would be really appreciated! 请,如果有人能给我以上代码的任何建议,我将不胜感激!
I am not sure if it is wise to encourage an R novice to enter a wild mix of data.table
syntax and function calls. 我不确定鼓励R新手输入data.table
语法和函数调用的混用是否明智。
However, here are some sample functions. 但是,这里有一些示例函数。
library(data.table)
data <- fread(
"NameA NameB NameC Score1 Score2
A F K 3 3
B F L 5 5
C F M 7 4
D G N 2 2
E G O 5 8"
)
Fun1 <- function(df, col, group_by){
setDT(df)[, sapply(.SD, mean), .SDcols = col, by = group_by]
}
Fun1(data, "Score1", "NameB")
NameB V1 1: F 5.0 2: G 3.5
Note that Score2
is used in the next example to reproduce OP's depicted dataframe: 请注意,在下一个示例中将使用Score2
来再现OP所描绘的数据帧:
Fun2 <- function(df, col, group_by){
setDT(df)[, Group_Mean := mean(get(col)), by = group_by]
}
Fun2(data, "Score2", "NameB")[]
NameA NameB NameC Score1 Score2 Group_Mean 1: AFK 3 3 4 2: BFL 5 5 4 3: CFM 7 4 4 4: DGN 2 2 5 5: EGO 5 8 5
Example 3: 范例3:
Fun3 <- function(df, col, group_by){
setDT(df)[, Group_Mean := mean(get(col)), by = group_by]
df[get(col) != Score2]
}
Fun3(data, "Score1", "NameB")[]
NameA NameB NameC Score1 Score2 Group_Mean 1: CFM 7 4 5.0 2: EGO 5 8 3.5
Note that the function below has been modified WRT to OP's draft in order to return a non-empty data.table 请注意,以下功能已被WRT修改为OP的草稿,以便返回非空的data.table
Fun4 <- function(df, col, group_by){
setDT(df)[, Group_Mean := mean(get(col)), by = group_by]
df[, NewCol := abs(get(col) - Group_Mean)]
df[between(NewCol, 1.0, 1.5, incbounds = TRUE)]
}
Fun4(data, "Score1", "NameB")[]
NameA NameB NameC Score1 Score2 Group_Mean NewCol 1: DGN 2 2 3.5 1.5 2: EGO 5 8 3.5 1.5
Note that data
has been modified in place
by all previous function calls 请注意, data
已被修改in place
以前所有的函数调用
data
NameA NameB NameC Score1 Score2 Group_Mean NewCol 1: AFK 3 3 5.0 2.0 2: BFL 5 5 5.0 0.0 3: CFM 7 4 5.0 2.0 4: DGN 2 2 3.5 1.5 5: EGO 5 8 3.5 1.5
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.