简体   繁体   English

R data.table中的分组计数聚合

[英]Grouped count aggregation in R data.table

table that contain dates, buy values and sell values. 包含日期,购买价值和销售价值的表格。 I'd like to count how many buys and sells per day and also total number of buys and sells. 我想计算每天购买和销售的数量,以及购买和销售的总数。 I'm finding this a little tricky to do in data.table. 我发现在data.table中这有点棘手。

   date   buy sell      
2011-01-01  1   0
2011-01-02  0   0
2011-01-03  0   2
2011-01-04  3   0
2011-01-05  0   0
2011-01-06  0   0
2011-01-01  0   0
2011-01-02  0   1
2011-01-03  4   0
2011-01-04  0   0
2011-01-05  0   0
2011-01-06  0   0
2011-01-01  0   0
2011-01-02  0   8
2011-01-03  2   0
2011-01-04  0   0
2011-01-05  0   0
2011-01-06  0   5

The above data.table can be created using the following code : 可以使用以下代码创建上述data.table:

 DT = data.table(
          date=rep(as.Date('2011-01-01')+0:5,3) , 
          buy=c(1,0,0,3,0,0,0,0,4,0,0,0,0,0,2,0,0,0),
          sell=c(0,0,2,0,0,0,0,1,0,0,0,0,0,8,0,0,0,5));

What I want as a result is : 我想要的结果是:

   date   total_buys   total_sells
2011-01-01    1            0
2011-01-02    0            2
                and so on  

Furthermore I'd also like to know the total number of buys and sells: 此外,我还想了解购买和销售的总数:

 total_buys   total_sells
     4            4

I have tried : 我努力了 :

 length(DT[sell > 0 | buy > 0])
 > 3 

Which is a strange answer (would like to know why) 这是一个奇怪的答案(想知道为什么)

## by date
DT[, list(total_buys = sum(buy > 0), total_sells = sum(sell > 0)), by = date]
##          date total_buys total_sells
## 1: 2011-01-01          1           0
## 2: 2011-01-02          0           2
## 3: 2011-01-03          2           1
## 4: 2011-01-04          1           0
## 5: 2011-01-05          0           0
## 6: 2011-01-06          0           1

DT[, list(total_buys = sum(buy > 0), total_sells = sum(sell > 0))]
##    total_buys total_sells
## 1:          4           4

An alternative to @Jake's answer is the typical melt + dcast routine, something like: dcast答案的另一种选择是典型的melt + dcast例程,例如:

library(reshape2)
dtL <- melt(DT, id.vars = "date")
dcast.data.table(dtL, date ~ variable, value.var = "value", 
                 fun.aggregate = function(x) sum(x > 0))
#         date buy sell
# 1 2011-01-01   1    0
# 2 2011-01-02   0    2
# 3 2011-01-03   2    1
# 4 2011-01-04   1    0
# 5 2011-01-05   0    0
# 6 2011-01-06   0    1

Or, without melting, just: 或者,没有融化,只需:

DT[, lapply(.SD, function(x) sum(x > 0)), by = date]

To get your other table, try: 要获得另一张桌子,请尝试:

dtL[, list(count = sum(value > 0)), by = variable]
#    variable count
# 1:      buy     4
# 2:     sell     4

Or, without melting: 或者,没有融化:

DT[, lapply(.SD, function(x) sum(x > 0)), .SDcols = c("buy", "sell")]
#    buy sell
# 1:   4    4

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM