繁体   English   中英

R data.table中的分组计数聚合

[英]Grouped count aggregation in R data.table

包含日期,购买价值和销售价值的表格。 我想计算每天购买和销售的数量,以及购买和销售的总数。 我发现在data.table中这有点棘手。

   date   buy sell      
2011-01-01  1   0
2011-01-02  0   0
2011-01-03  0   2
2011-01-04  3   0
2011-01-05  0   0
2011-01-06  0   0
2011-01-01  0   0
2011-01-02  0   1
2011-01-03  4   0
2011-01-04  0   0
2011-01-05  0   0
2011-01-06  0   0
2011-01-01  0   0
2011-01-02  0   8
2011-01-03  2   0
2011-01-04  0   0
2011-01-05  0   0
2011-01-06  0   5

可以使用以下代码创建上述data.table:

 DT = data.table(
          date=rep(as.Date('2011-01-01')+0:5,3) , 
          buy=c(1,0,0,3,0,0,0,0,4,0,0,0,0,0,2,0,0,0),
          sell=c(0,0,2,0,0,0,0,1,0,0,0,0,0,8,0,0,0,5));

我想要的结果是:

   date   total_buys   total_sells
2011-01-01    1            0
2011-01-02    0            2
                and so on  

此外,我还想了解购买和销售的总数:

 total_buys   total_sells
     4            4

我努力了 :

 length(DT[sell > 0 | buy > 0])
 > 3 

这是一个奇怪的答案(想知道为什么)

## by date
DT[, list(total_buys = sum(buy > 0), total_sells = sum(sell > 0)), by = date]
##          date total_buys total_sells
## 1: 2011-01-01          1           0
## 2: 2011-01-02          0           2
## 3: 2011-01-03          2           1
## 4: 2011-01-04          1           0
## 5: 2011-01-05          0           0
## 6: 2011-01-06          0           1

DT[, list(total_buys = sum(buy > 0), total_sells = sum(sell > 0))]
##    total_buys total_sells
## 1:          4           4

dcast答案的另一种选择是典型的melt + dcast例程,例如:

library(reshape2)
dtL <- melt(DT, id.vars = "date")
dcast.data.table(dtL, date ~ variable, value.var = "value", 
                 fun.aggregate = function(x) sum(x > 0))
#         date buy sell
# 1 2011-01-01   1    0
# 2 2011-01-02   0    2
# 3 2011-01-03   2    1
# 4 2011-01-04   1    0
# 5 2011-01-05   0    0
# 6 2011-01-06   0    1

或者,没有融化,只需:

DT[, lapply(.SD, function(x) sum(x > 0)), by = date]

要获得另一张桌子,请尝试:

dtL[, list(count = sum(value > 0)), by = variable]
#    variable count
# 1:      buy     4
# 2:     sell     4

或者,没有融化:

DT[, lapply(.SD, function(x) sum(x > 0)), .SDcols = c("buy", "sell")]
#    buy sell
# 1:   4    4

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM