[英]Groupby bins and aggregate in R
I have data like (a,b,c) 我有(a,b,c)之类的数据
a b c
1 2 1
2 3 1
9 2 2
1 6 2
where 'a' range is divided into n (say 3) equal parts and aggregate function calculates b values (say max) and grouped by at 'c' also. 其中,“ a”范围分为n个(例如3个)相等的部分,聚合函数计算b个值(例如max),并按“ c”个分组。
So the output looks like 所以输出看起来像
a_bin b_m(c=1) b_m(c=2)
1-3 3 6
4-6 NaN NaN
7-9 NaN 2
Which is MxN where M=number of a bins, N=unique c samples or all range 这是MxN,其中M = bin个数,N =唯一的c个样本或所有范围
How do I approach this? 我该如何处理? Can any R package help me through? 任何R包都能帮助我吗?
There would be easier ways. 会有更简单的方法。
If your dataset is dat
如果您的数据集是dat
res <- sapply(split(dat[, -3], dat$c), function(x) {
a_bin <- with(x, cut(a, breaks = c(1, 3, 6, 9), include.lowest = T, labels = c("1-3",
"4-6", "7-9")))
c(by(x$b, a_bin, FUN = max))
})
res1 <- setNames(data.frame(row.names(res), res),
c("a_bin", "b_m(c=1)", "b_m(c=2)"))
row.names(res1) <- 1:nrow(res1)
res1
a_bin b_m(c=1) b_m(c=2)
1 1-3 3 6
2 4-6 NA NA
3 7-9 NA 2
I would use a combination of data.table
and reshape2
which are both fully optimized for speed (not using for
loops from apply
family). 我会用的组合data.table
和reshape2
这两者都是完全优化的速度(不使用for
循环从apply
家庭)。
The output won't return the unused bins. 输出不会返回未使用的垃圾箱。
v <- c(1, 4, 7, 10) # creating bins
temp$int <- findInterval(temp$a, v)
library(data.table)
temp <- setDT(temp)[, list(b_m = max(b)), by = c("c", "int")]
library(reshape2)
temp <- dcast.data.table(temp, int ~ c, value.var = "b_m")
## colnames(temp) <- c("a_bin", "b_m(c=1)", "b_m(c=2)") # Optional for prettier table
## temp$a_bin<- c("1-3", "7-9") # Optional for prettier table
## a_bin b_m(c=1) b_m(c=2)
## 1 1-3 3 6
## 2 7-9 NA 2
A combination of aggregate
, cut
and reshape
seems to work aggregate
, cut
和reshape
组合似乎有效
df <- data.frame(a = c(1,2,9,1),
b = c(2,3,2,6),
c = c(1,1,2,2))
breaks <- c(0, 3, 6, 9)
# Aggregate data
ag <- aggregate(df$b, FUN=max,
by=list(a=cut(df$a, breaks, include.lowest=T), c=df$c))
# Reshape data
res <- reshape(ag, idvar="a", timevar="c", direction="wide")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.