简体   繁体   English

用户定义的功能,带有数据表聚合

[英]User-Defined function with data table aggregation

I'm attempting to write a function majorly with aggregations/merge/subset data sets. 我正在尝试主要使用聚合/合并/子集数据集编写函数。 My data frame looks like: 我的数据框如下所示:

NameA   NameB   NameC   Score1   Score2
  A       F       K       3         3
  B       F       L       5         5
  C       F       M       7         4
  D       G       N       2         2
  E       G       O       5         8

and the function I will run is: 我将运行的功能是:

test <- Fun(data, Score1, NameB)

First, I want to calculate the mean of Score1, grouped by NameB: 首先,我要计算按NameB分组的Score1的平均值:

Fun <- function(df, col, group_by){
       setDT(df)
       df1<- df[, sapply(.SD, mean),  .SDcols = col, by= group_by]
    }

After some extra coding, My data frame turns to be: 经过一些额外的编码后,我的数据帧变为:

NameA   NameB   NameC   Score1   Score2   Group_Mean
  A       F       K       3         3          4
  B       F       L       5         5          4
  C       F       M       4         4          4
  D       G       N       2         2          5
  E       G       O       5         8          5

Then, I want subset my data frame with Score1 != Score2. 然后,我想用Score1!= Score2子集我的数据框。 So I write: 所以我写:

Fun <- function(df, col, group_by){
       setDT(df)
       df1<- df[, sapply(.SD, mean),  .SDcols = col, by= group_by]           
       df2 <- df1[which(df1[col] != df[Score2])]
}

but this gives me an error message as : Error in Ops.data.frame(df2[col], df[Score2]) : '==' only defined for equally-sized data frames 但这给了我一条错误消息,如: Error in Ops.data.frame(df2[col], df[Score2]) : '==' only defined for equally-sized data frames

After this step, I want to do some more math and subset as below: 完成此步骤后,我想做一些更多的数学运算和子集,如下所示:

Fun <- function(df, col, group_by){
       setDT(df)
       df1<- df[, sapply(.SD, mean),  .SDcols = col, by= group_by]           
       df2 <- df1[which(df1[col] != df[Score2])]

       df2["NewCol"] <- abs(df2[col] - df2[Score2])
       output <- df2[which(df2[NewCol] > 1 or df2[NewCol] < 1.5)]
       return(output)
    }

I'm new to R and R user-defined function. 我是R和R用户定义函数的新手。 After the error message part, I'm stuck for a long time. 在错误消息部分之后,我被困了很长时间。 Please, if anyone is able to give me any suggestions on my code above, I would be really appreciated! 请,如果有人能给我以上代码的任何建议,我将不胜感激!

I am not sure if it is wise to encourage an R novice to enter a wild mix of data.table syntax and function calls. 我不确定鼓励R新手输入data.table语法和函数调用的混用是否明智。

However, here are some sample functions. 但是,这里有一些示例函数。

library(data.table)

data <- fread(
  "NameA   NameB   NameC   Score1   Score2
  A       F       K       3         3
  B       F       L       5         5
  C       F       M       7         4
  D       G       N       2         2
  E       G       O       5         8"
)

Fun1 <- function(df, col, group_by){
  setDT(df)[, sapply(.SD, mean),  .SDcols = col, by = group_by]
}
Fun1(data, "Score1", "NameB")
  NameB V1 1: F 5.0 2: G 3.5 

Note that Score2 is used in the next example to reproduce OP's depicted dataframe: 请注意,在下一个示例中将使用Score2来再现OP所描绘的数据帧:

Fun2 <- function(df, col, group_by){
  setDT(df)[, Group_Mean := mean(get(col)), by = group_by]
}
Fun2(data, "Score2", "NameB")[]
  NameA NameB NameC Score1 Score2 Group_Mean 1: AFK 3 3 4 2: BFL 5 5 4 3: CFM 7 4 4 4: DGN 2 2 5 5: EGO 5 8 5 

Example 3: 范例3:

Fun3 <- function(df, col, group_by){
  setDT(df)[, Group_Mean := mean(get(col)), by = group_by]
  df[get(col) != Score2]
}
Fun3(data, "Score1", "NameB")[]
  NameA NameB NameC Score1 Score2 Group_Mean 1: CFM 7 4 5.0 2: EGO 5 8 3.5 

Note that the function below has been modified WRT to OP's draft in order to return a non-empty data.table 请注意,以下功能已被WRT修改为OP的草稿,以便返回非空的data.table

Fun4 <- function(df, col, group_by){
  setDT(df)[, Group_Mean := mean(get(col)), by = group_by]
  df[, NewCol := abs(get(col) - Group_Mean)]
  df[between(NewCol, 1.0, 1.5, incbounds = TRUE)]
}
Fun4(data, "Score1", "NameB")[]
  NameA NameB NameC Score1 Score2 Group_Mean NewCol 1: DGN 2 2 3.5 1.5 2: EGO 5 8 3.5 1.5 

Note that data has been modified in place by all previous function calls 请注意, data已被修改in place以前所有的函数调用

data
  NameA NameB NameC Score1 Score2 Group_Mean NewCol 1: AFK 3 3 5.0 2.0 2: BFL 5 5 5.0 0.0 3: CFM 7 4 5.0 2.0 4: DGN 2 2 3.5 1.5 5: EGO 5 8 3.5 1.5 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM