[英]Grouped count aggregation in R data.table
包含日期,购买价值和销售价值的表格。 我想计算每天购买和销售的数量,以及购买和销售的总数。 我发现在data.table中这有点棘手。
date buy sell
2011-01-01 1 0
2011-01-02 0 0
2011-01-03 0 2
2011-01-04 3 0
2011-01-05 0 0
2011-01-06 0 0
2011-01-01 0 0
2011-01-02 0 1
2011-01-03 4 0
2011-01-04 0 0
2011-01-05 0 0
2011-01-06 0 0
2011-01-01 0 0
2011-01-02 0 8
2011-01-03 2 0
2011-01-04 0 0
2011-01-05 0 0
2011-01-06 0 5
可以使用以下代码创建上述data.table:
DT = data.table(
date=rep(as.Date('2011-01-01')+0:5,3) ,
buy=c(1,0,0,3,0,0,0,0,4,0,0,0,0,0,2,0,0,0),
sell=c(0,0,2,0,0,0,0,1,0,0,0,0,0,8,0,0,0,5));
我想要的结果是:
date total_buys total_sells
2011-01-01 1 0
2011-01-02 0 2
and so on
此外,我还想了解购买和销售的总数:
total_buys total_sells
4 4
我努力了 :
length(DT[sell > 0 | buy > 0])
> 3
这是一个奇怪的答案(想知道为什么)
## by date
DT[, list(total_buys = sum(buy > 0), total_sells = sum(sell > 0)), by = date]
## date total_buys total_sells
## 1: 2011-01-01 1 0
## 2: 2011-01-02 0 2
## 3: 2011-01-03 2 1
## 4: 2011-01-04 1 0
## 5: 2011-01-05 0 0
## 6: 2011-01-06 0 1
DT[, list(total_buys = sum(buy > 0), total_sells = sum(sell > 0))]
## total_buys total_sells
## 1: 4 4
dcast
答案的另一种选择是典型的melt
+ dcast
例程,例如:
library(reshape2)
dtL <- melt(DT, id.vars = "date")
dcast.data.table(dtL, date ~ variable, value.var = "value",
fun.aggregate = function(x) sum(x > 0))
# date buy sell
# 1 2011-01-01 1 0
# 2 2011-01-02 0 2
# 3 2011-01-03 2 1
# 4 2011-01-04 1 0
# 5 2011-01-05 0 0
# 6 2011-01-06 0 1
或者,没有融化,只需:
DT[, lapply(.SD, function(x) sum(x > 0)), by = date]
要获得另一张桌子,请尝试:
dtL[, list(count = sum(value > 0)), by = variable]
# variable count
# 1: buy 4
# 2: sell 4
或者,没有融化:
DT[, lapply(.SD, function(x) sum(x > 0)), .SDcols = c("buy", "sell")]
# buy sell
# 1: 4 4
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.