简体   繁体   English

R data.table-部分聚集在组中并执行操作

[英]R data.table - aggregate partially within group and perform operation

Is there a nice way to make a sub-group within a grouping column in data.table operations? 有没有一种好的方法可以在data.table操作的分组列中创建一个子组?

The result I would like is the output from this: 我想要的结果是此输出:

dt <- data.table(
  group = c("a","a","a","b","b","b","c","c"),
  value = c(1,2,3,4,5,6,7,8)
)

dt[group!="a", group:="Other"][, sum(value), by=.(group)][]

which gives 这使

group V1
a     6
Other 30

However, this alters the original data.table . 但是,这会更改原始data.table I don't know if there is a different way to do this that wouldn't involve merging two data.table . 我不知道是否有另一种方式可以完成两个data.table合并。 I can imagine a more complicated use case where I want group %in% c("a","b") as one sub-group and group %in% c("c","d") another, etc. 我可以想象一个更复杂的用例,其中我希望group %in% c("a","b")作为一个子组,而group %in% c("c","d")另一个,依此类推。

I think this is like a SQL right excluding join (using the terminology here ) 我认为这就像是排除连接的SQL权限(使用此处的术语)

You can go through by group and within each group perform an anti-join 您可以按组进行检查,并在每个组中执行反加入

#group no longer found in .SD, hence make a copy of the column
dt[, g:=group]

#go through each group, anti-join with other groups, aggregate value
dt[, .(
        sumGrpVal=sum(value), 
        sumNonGrpVal=dt[!.SD, sum(value), on=c("group"="g")]
    ), by=.(group)]

or an even faster way: 甚至更快的方法:

dt[, .(
    sumGrpVal=sum(value), 
    sumNonGrpVal=dt[group!=.BY$group, sum(value)]
), by=.(group)]

output: 输出:

   group sumGrpVal sumNonGrpVal
1:     a         6           30
2:     b        15           21
3:     c        15           21

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM