简体   繁体   English

数据表总和和子集

[英]data.table sum and subset

I have a data.table that I am wanting to aggregate我有一个想要聚合的 data.table

library(data.table)
dt1 <- data.table(year=c("2001","2001","2001","2002","2002","2002","2002"),
                  group=c("a","a","b","a","a","b","b"), 
                  amt=c(20,40,20,35,30,28,19))

I am wanting to sum the amt by year and group and then filter where the summed amt for any given group is greater than 100.我想按年份和组对 amt sum ,然后过滤任何给定组的总和大于 100 的地方。

I've got the data.table sum nailed.我已经确定了 data.table 总和。

dt1[, sum(amt),by=list(year,group)]

   year group V1
1: 2001     a 60
2: 2001     b 20
3: 2002     a 65
4: 2002     b 47

I am having trouble with my final level of filtering.我的最终过滤级别有问题。

The end outcome I am looking for is:我正在寻找的最终结果是:

   year group V1
1: 2001     a 60
2: 2002     a 65

As a) 60 + 65 > 100 whereas b) 20 + 47 <= 100a) 60 + 65 > 100b) 20 + 47 <= 100

Any thoughts on how to achieve this would be great.关于如何实现这一目标的任何想法都会很棒。

I had a look at this data.table sum by group and return row with max value and was wondering whether or not their is an equally eloquent solution to my problem.我查看了这个data.table sum by group 并返回具有最大值的行,并想知道它们是否是我的问题的同样雄辩的解决方案。

Single liner in data.table : data.table中的data.table

dt1[, lapply(.SD,sum), by=.(year,group)][, if (sum(amt) > 100) .SD, by=group]

#   group year amt
#1:     a 2001  60
#2:     a 2002  65

You can do:你可以做:

library(dplyr)
dt1 %>% 
  group_by(group, year) %>% 
  summarise(amt = sum(amt)) %>%
  filter(sum(amt) > 100)

Which gives:这使:

#Source: local data table [2 x 3]
#Groups: group
#
#  year group amt
#1 2001     a  60
#2 2002     a  65

This might not be an idea solution, but I would do that in several steps as follows:这可能不是一个想法解决方案,但我会按照以下几个步骤来做到这一点:

dt2=dt1[, sum(amt),by=list(year,group)]
dt3=dt1[, sum(amt)>100,by=list(group)]
dt_result=dt2[group %in% dt3[V1==TRUE]$group,]

Here's a two-liner.这是一个两班。 Find the subset of groups you want first首先找到您想要的组子集

big_groups <- dt1[,sum(amt),by=group][V1>100]$group
dt1[group%in%big_groups,sum(amt),by=list(year,group)]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM