[英]R data.table - aggregate partially within group and perform operation
Is there a nice way to make a sub-group within a grouping column in data.table
operations? 有没有一种好的方法可以在data.table
操作的分组列中创建一个子组?
The result I would like is the output from this: 我想要的结果是此输出:
dt <- data.table(
group = c("a","a","a","b","b","b","c","c"),
value = c(1,2,3,4,5,6,7,8)
)
dt[group!="a", group:="Other"][, sum(value), by=.(group)][]
which gives 这使
group V1
a 6
Other 30
However, this alters the original data.table
. 但是,这会更改原始data.table
。 I don't know if there is a different way to do this that wouldn't involve merging two data.table
. 我不知道是否有另一种方式可以完成两个data.table
合并。 I can imagine a more complicated use case where I want group %in% c("a","b")
as one sub-group and group %in% c("c","d")
another, etc. 我可以想象一个更复杂的用例,其中我希望group %in% c("a","b")
作为一个子组,而group %in% c("c","d")
另一个,依此类推。
I think this is like a SQL right excluding join (using the terminology here ) 我认为这就像是排除连接的SQL权限(使用此处的术语)
You can go through by group and within each group perform an anti-join 您可以按组进行检查,并在每个组中执行反加入
#group no longer found in .SD, hence make a copy of the column
dt[, g:=group]
#go through each group, anti-join with other groups, aggregate value
dt[, .(
sumGrpVal=sum(value),
sumNonGrpVal=dt[!.SD, sum(value), on=c("group"="g")]
), by=.(group)]
or an even faster way: 甚至更快的方法:
dt[, .(
sumGrpVal=sum(value),
sumNonGrpVal=dt[group!=.BY$group, sum(value)]
), by=.(group)]
output: 输出:
group sumGrpVal sumNonGrpVal
1: a 6 30
2: b 15 21
3: c 15 21
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.