用户定义的功能，带有数据表聚合

Question

I'm attempting to write a function majorly with aggregations/merge/subset data sets. 我正在尝试主要使用聚合/合并/子集数据集编写函数。 My data frame looks like: 我的数据框如下所示：

NameA   NameB   NameC   Score1   Score2
  A       F       K       3         3
  B       F       L       5         5
  C       F       M       7         4
  D       G       N       2         2
  E       G       O       5         8

and the function I will run is: 我将运行的功能是：

test <- Fun(data, Score1, NameB)

First, I want to calculate the mean of Score1, grouped by NameB: 首先，我要计算按NameB分组的Score1的平均值：

Fun <- function(df, col, group_by){
       setDT(df)
       df1<- df[, sapply(.SD, mean),  .SDcols = col, by= group_by]
    }

After some extra coding, My data frame turns to be: 经过一些额外的编码后，我的数据帧变为：

NameA   NameB   NameC   Score1   Score2   Group_Mean
  A       F       K       3         3          4
  B       F       L       5         5          4
  C       F       M       4         4          4
  D       G       N       2         2          5
  E       G       O       5         8          5

Then, I want subset my data frame with Score1 != Score2. 然后，我想用Score1！= Score2子集我的数据框。 So I write: 所以我写：

Fun <- function(df, col, group_by){
       setDT(df)
       df1<- df[, sapply(.SD, mean),  .SDcols = col, by= group_by]           
       df2 <- df1[which(df1[col] != df[Score2])]
}

but this gives me an error message as : Error in Ops.data.frame(df2[col], df[Score2]) : '==' only defined for equally-sized data frames 但这给了我一条错误消息，如： Error in Ops.data.frame(df2[col], df[Score2]) : '==' only defined for equally-sized data frames

After this step, I want to do some more math and subset as below: 完成此步骤后，我想做一些更多的数学运算和子集，如下所示：

Fun <- function(df, col, group_by){
       setDT(df)
       df1<- df[, sapply(.SD, mean),  .SDcols = col, by= group_by]           
       df2 <- df1[which(df1[col] != df[Score2])]

       df2["NewCol"] <- abs(df2[col] - df2[Score2])
       output <- df2[which(df2[NewCol] > 1 or df2[NewCol] < 1.5)]
       return(output)
    }

I'm new to R and R user-defined function. 我是R和R用户定义函数的新手。 After the error message part, I'm stuck for a long time. 在错误消息部分之后，我被困了很长时间。 Please, if anyone is able to give me any suggestions on my code above, I would be really appreciated! 请，如果有人能给我以上代码的任何建议，我将不胜感激！

Answer 1

I am not sure if it is wise to encourage an R novice to enter a wild mix of data.table syntax and function calls. 我不确定鼓励R新手输入data.table语法和函数调用的混用是否明智。

However, here are some sample functions. 但是，这里有一些示例函数。

library(data.table)

data <- fread(
  "NameA   NameB   NameC   Score1   Score2
  A       F       K       3         3
  B       F       L       5         5
  C       F       M       7         4
  D       G       N       2         2
  E       G       O       5         8"
)

Fun1 <- function(df, col, group_by){
  setDT(df)[, sapply(.SD, mean),  .SDcols = col, by = group_by]
}
Fun1(data, "Score1", "NameB")

  NameB V1 1: F 5.0 2: G 3.5

Note that Score2 is used in the next example to reproduce OP's depicted dataframe: 请注意，在下一个示例中将使用Score2来再现OP所描绘的数据帧：

Fun2 <- function(df, col, group_by){
  setDT(df)[, Group_Mean := mean(get(col)), by = group_by]
}
Fun2(data, "Score2", "NameB")[]

  NameA NameB NameC Score1 Score2 Group_Mean 1: AFK 3 3 4 2: BFL 5 5 4 3: CFM 7 4 4 4: DGN 2 2 5 5: EGO 5 8 5

Example 3: 范例3：

Fun3 <- function(df, col, group_by){
  setDT(df)[, Group_Mean := mean(get(col)), by = group_by]
  df[get(col) != Score2]
}
Fun3(data, "Score1", "NameB")[]

  NameA NameB NameC Score1 Score2 Group_Mean 1: CFM 7 4 5.0 2: EGO 5 8 3.5

Note that the function below has been modified WRT to OP's draft in order to return a non-empty data.table 请注意，以下功能已被WRT修改为OP的草稿，以便返回非空的data.table

Fun4 <- function(df, col, group_by){
  setDT(df)[, Group_Mean := mean(get(col)), by = group_by]
  df[, NewCol := abs(get(col) - Group_Mean)]
  df[between(NewCol, 1.0, 1.5, incbounds = TRUE)]
}
Fun4(data, "Score1", "NameB")[]

  NameA NameB NameC Score1 Score2 Group_Mean NewCol 1: DGN 2 2 3.5 1.5 2: EGO 5 8 3.5 1.5

Note that data has been modified in place by all previous function calls 请注意， data已被修改in place以前所有的函数调用

data

  NameA NameB NameC Score1 Score2 Group_Mean NewCol 1: AFK 3 3 5.0 2.0 2: BFL 5 5 5.0 0.0 3: CFM 7 4 5.0 2.0 4: DGN 2 2 3.5 1.5 5: EGO 5 8 3.5 1.5

用户定义的功能，带有数据表聚合

问题描述

1 个解决方案

解决方案1
1 已采纳 2018-04-12 06:24:38

用户定义的功能，带有数据表聚合

问题描述

1 个解决方案

解决方案1 1 已采纳 2018-04-12 06:24:38

解决方案1
1 已采纳 2018-04-12 06:24:38