R中的Groupby箱和集合

Question

I have data like (a,b,c) 我有（a，b，c）之类的数据

where 'a' range is divided into n (say 3) equal parts and aggregate function calculates b values (say max) and grouped by at 'c' also. 其中，“ a”范围分为n个（例如3个）相等的部分，聚合函数计算b个值（例如max），并按“ c”个分组。

So the output looks like 所以输出看起来像

a_bin  b_m(c=1) b_m(c=2)
1-3     3          6
4-6     NaN        NaN
7-9     NaN        2

Which is MxN where M=number of a bins, N=unique c samples or all range 这是MxN，其中M = bin个数，N =唯一的c个样本或所有范围

How do I approach this? 我该如何处理？ Can any R package help me through? 任何R包都能帮助我吗？

Answer 1

There would be easier ways. 会有更简单的方法。

If your dataset is dat 如果您的数据集是dat

res <- sapply(split(dat[, -3], dat$c), function(x) {
a_bin <- with(x, cut(a, breaks = c(1, 3, 6, 9), include.lowest = T, labels = c("1-3", 
    "4-6", "7-9")))
c(by(x$b, a_bin, FUN = max))
})
res1 <- setNames(data.frame(row.names(res), res), 
        c("a_bin", "b_m(c=1)", "b_m(c=2)"))
row.names(res1) <- 1:nrow(res1)

 res1
 a_bin b_m(c=1) b_m(c=2)
1   1-3        3        6
2   4-6       NA       NA
3   7-9       NA        2

Answer 2

I would use a combination of data.table and reshape2 which are both fully optimized for speed (not using for loops from apply family). 我会用的组合data.table和reshape2这两者都是完全优化的速度（不使用for循环从apply家庭）。

The output won't return the unused bins. 输出不会返回未使用的垃圾箱。

v <- c(1, 4, 7, 10) # creating bins 
temp$int <- findInterval(temp$a, v)

library(data.table)
temp <- setDT(temp)[, list(b_m = max(b)), by = c("c", "int")]

library(reshape2)
temp <- dcast.data.table(temp, int ~ c, value.var = "b_m")
## colnames(temp) <- c("a_bin", "b_m(c=1)", "b_m(c=2)") # Optional for prettier table
## temp$a_bin<- c("1-3", "7-9") # Optional for prettier table

##   a_bin b_m(c=1) b_m(c=2)
## 1   1-3        3        6
## 2   7-9       NA        2

Answer 3

A combination of aggregate , cut and reshape seems to work aggregate ， cut和reshape组合似乎有效

df <- data.frame(a = c(1,2,9,1),
                 b = c(2,3,2,6),
                 c = c(1,1,2,2))

breaks <- c(0, 3, 6, 9)

# Aggregate data
ag <- aggregate(df$b, FUN=max,
                by=list(a=cut(df$a, breaks, include.lowest=T), c=df$c))

# Reshape data
res <- reshape(ag, idvar="a", timevar="c", direction="wide")

R中的Groupby箱和集合

问题描述

3 个解决方案

解决方案1
2 2014-06-22 10:20:20

解决方案2
2 2014-06-22 10:25:30

解决方案3
2 2014-06-22 10:29:22

R中的Groupby箱和集合

问题描述

3 个解决方案

解决方案1 2 2014-06-22 10:20:20

解决方案2 2 2014-06-22 10:25:30

解决方案3 2 2014-06-22 10:29:22

解决方案1
2 2014-06-22 10:20:20

解决方案2
2 2014-06-22 10:25:30

解决方案3
2 2014-06-22 10:29:22