[英]data.table sum and subset
I have a data.table that I am wanting to aggregate我有一个想要聚合的 data.table
library(data.table)
dt1 <- data.table(year=c("2001","2001","2001","2002","2002","2002","2002"),
group=c("a","a","b","a","a","b","b"),
amt=c(20,40,20,35,30,28,19))
I am wanting to sum
the amt by year and group and then filter where the summed amt for any given group is greater than 100.我想按年份和组对 amt
sum
,然后过滤任何给定组的总和大于 100 的地方。
I've got the data.table sum nailed.我已经确定了 data.table 总和。
dt1[, sum(amt),by=list(year,group)]
year group V1
1: 2001 a 60
2: 2001 b 20
3: 2002 a 65
4: 2002 b 47
I am having trouble with my final level of filtering.我的最终过滤级别有问题。
The end outcome I am looking for is:我正在寻找的最终结果是:
year group V1
1: 2001 a 60
2: 2002 a 65
As a) 60 + 65 > 100
whereas b) 20 + 47 <= 100
如
a) 60 + 65 > 100
而b) 20 + 47 <= 100
Any thoughts on how to achieve this would be great.关于如何实现这一目标的任何想法都会很棒。
I had a look at this data.table sum by group and return row with max value and was wondering whether or not their is an equally eloquent solution to my problem.我查看了这个data.table sum by group 并返回具有最大值的行,并想知道它们是否是我的问题的同样雄辩的解决方案。
Single liner in data.table
: data.table
中的data.table
:
dt1[, lapply(.SD,sum), by=.(year,group)][, if (sum(amt) > 100) .SD, by=group]
# group year amt
#1: a 2001 60
#2: a 2002 65
You can do:你可以做:
library(dplyr)
dt1 %>%
group_by(group, year) %>%
summarise(amt = sum(amt)) %>%
filter(sum(amt) > 100)
Which gives:这使:
#Source: local data table [2 x 3]
#Groups: group
#
# year group amt
#1 2001 a 60
#2 2002 a 65
This might not be an idea solution, but I would do that in several steps as follows:这可能不是一个想法解决方案,但我会按照以下几个步骤来做到这一点:
dt2=dt1[, sum(amt),by=list(year,group)]
dt3=dt1[, sum(amt)>100,by=list(group)]
dt_result=dt2[group %in% dt3[V1==TRUE]$group,]
Here's a two-liner.这是一个两班。 Find the subset of groups you want first
首先找到您想要的组子集
big_groups <- dt1[,sum(amt),by=group][V1>100]$group
dt1[group%in%big_groups,sum(amt),by=list(year,group)]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.