R data.table-部分聚集在组中并执行操作

Question

Is there a nice way to make a sub-group within a grouping column in data.table operations? 有没有一种好的方法可以在data.table操作的分组列中创建一个子组？

The result I would like is the output from this: 我想要的结果是此输出：

dt <- data.table(
  group = c("a","a","a","b","b","b","c","c"),
  value = c(1,2,3,4,5,6,7,8)
)

dt[group!="a", group:="Other"][, sum(value), by=.(group)][]

which gives 这使

group V1
a     6
Other 30

However, this alters the original data.table . 但是，这会更改原始data.table 。 I don't know if there is a different way to do this that wouldn't involve merging two data.table . 我不知道是否有另一种方式可以完成两个data.table合并。 I can imagine a more complicated use case where I want group %in% c("a","b") as one sub-group and group %in% c("c","d") another, etc. 我可以想象一个更复杂的用例，其中我希望group %in% c("a","b")作为一个子组，而group %in% c("c","d")另一个，依此类推。

Answer 1

I think this is like a SQL right excluding join (using the terminology here ) 我认为这就像是排除连接的SQL权限（使用此处的术语）

You can go through by group and within each group perform an anti-join 您可以按组进行检查，并在每个组中执行反加入

#group no longer found in .SD, hence make a copy of the column
dt[, g:=group]

#go through each group, anti-join with other groups, aggregate value
dt[, .(
        sumGrpVal=sum(value), 
        sumNonGrpVal=dt[!.SD, sum(value), on=c("group"="g")]
    ), by=.(group)]

or an even faster way: 甚至更快的方法：

dt[, .(
    sumGrpVal=sum(value), 
    sumNonGrpVal=dt[group!=.BY$group, sum(value)]
), by=.(group)]

output: 输出：

   group sumGrpVal sumNonGrpVal
1:     a         6           30
2:     b        15           21
3:     c        15           21

R data.table-部分聚集在组中并执行操作

问题描述

1 个解决方案

解决方案1
0 2018-09-27 00:21:42

R data.table-部分聚集在组中并执行操作

问题描述

1 个解决方案

解决方案1 0 2018-09-27 00:21:42

解决方案1
0 2018-09-27 00:21:42