数据表总和和子集

Question

I have a data.table that I am wanting to aggregate我有一个想要聚合的 data.table

library(data.table)
dt1 <- data.table(year=c("2001","2001","2001","2002","2002","2002","2002"),
                  group=c("a","a","b","a","a","b","b"), 
                  amt=c(20,40,20,35,30,28,19))

I am wanting to sum the amt by year and group and then filter where the summed amt for any given group is greater than 100.我想按年份和组对 amt sum ，然后过滤任何给定组的总和大于 100 的地方。

I've got the data.table sum nailed.我已经确定了 data.table 总和。

dt1[, sum(amt),by=list(year,group)]

   year group V1
1: 2001     a 60
2: 2001     b 20
3: 2002     a 65
4: 2002     b 47

I am having trouble with my final level of filtering.我的最终过滤级别有问题。

The end outcome I am looking for is:我正在寻找的最终结果是：

   year group V1
1: 2001     a 60
2: 2002     a 65

As a) 60 + 65 > 100 whereas b) 20 + 47 <= 100如a) 60 + 65 > 100而b) 20 + 47 <= 100

Any thoughts on how to achieve this would be great.关于如何实现这一目标的任何想法都会很棒。

I had a look at this data.table sum by group and return row with max value and was wondering whether or not their is an equally eloquent solution to my problem.我查看了这个data.table sum by group 并返回具有最大值的行，并想知道它们是否是我的问题的同样雄辩的解决方案。

Answer 1

Single liner in data.table : data.table中的data.table ：

dt1[, lapply(.SD,sum), by=.(year,group)][, if (sum(amt) > 100) .SD, by=group]

#   group year amt
#1:     a 2001  60
#2:     a 2002  65

Answer 2

You can do:你可以做：

library(dplyr)
dt1 %>% 
  group_by(group, year) %>% 
  summarise(amt = sum(amt)) %>%
  filter(sum(amt) > 100)

Which gives:这使：

#Source: local data table [2 x 3]
#Groups: group
#
#  year group amt
#1 2001     a  60
#2 2002     a  65

Answer 3

This might not be an idea solution, but I would do that in several steps as follows:这可能不是一个想法解决方案，但我会按照以下几个步骤来做到这一点：

dt2=dt1[, sum(amt),by=list(year,group)]
dt3=dt1[, sum(amt)>100,by=list(group)]
dt_result=dt2[group %in% dt3[V1==TRUE]$group,]

Answer 4

Here's a two-liner.这是一个两班。 Find the subset of groups you want first首先找到您想要的组子集

big_groups <- dt1[,sum(amt),by=group][V1>100]$group
dt1[group%in%big_groups,sum(amt),by=list(year,group)]

数据表总和和子集

问题描述

4 个解决方案

解决方案1
19 已采纳 2015-05-12 04:40:16

解决方案2
6 2015-05-12 02:28:22

解决方案3
3 2015-05-12 02:37:59

解决方案4
2 2015-05-12 02:33:08

数据表总和和子集

问题描述

4 个解决方案

解决方案1 19 已采纳 2015-05-12 04:40:16

解决方案2 6 2015-05-12 02:28:22

解决方案3 3 2015-05-12 02:37:59

解决方案4 2 2015-05-12 02:33:08

解决方案1
19 已采纳 2015-05-12 04:40:16

解决方案2
6 2015-05-12 02:28:22

解决方案3
3 2015-05-12 02:37:59

解决方案4
2 2015-05-12 02:33:08