[英]aggregate over several variables in r
I have a rather large dataset in a long format where I need to count the number of instances of the ID due to two different variables, A & BEg The same person can be represented in multiple rows due to either A or B. What I need to do is to count the number of instances of ID which is not too hard, but also count the number of ID due to A and B and return these as variables in the dataset. 我有一个长格式的相当大的数据集,由于两个不同的变量A和BEg,我需要计算ID的实例数。由于A或B,同一个人可以在多行中表示。我需要要做的是计算不太难的ID实例的数量,还要计算由于A和B导致的ID的数量,并将它们作为变量返回到数据集中。
Regards, 问候,
//Mi // Mi
The ddply()
function from the package plyr
lets you break data apart by identifier variables, perform a function on each chunk, and then assemble it all back together. 所述ddply()
从封装功能plyr
允许通过标识符变量打散数据,对每个块执行的功能,并且然后装配它全部回到一起。 So you need to break your data apart by identifier and A/B status, count how many times each of those combinations occur (using nrow()
), and then put those counts back together nicely. 因此,您需要按标识符和A / B状态将数据分开,计算每种组合发生的次数(使用nrow()
),然后将这些计数很好地重新组合在一起。
Using wkmor1's df
: 使用wkmor1的df
:
library(plyr)
x <- ddply(.data = df, .var = c("ID", "GRP"), .fun = nrow)
which returns: 返回:
ID GRP V1
1 1 a 2
2 1 b 2
3 2 a 2
4 2 b 2
And then merge that back on to the original data: 然后将其合并回原始数据:
merge(x, df, by = c("ID", "GRP"))
好吧,根据我的理解,最快,最简单的解决方案是...
df$IDCount <- ave(df$ID, df$group, FUN = length)
Here is one approach using 'table' to count rows meeting your criteria, and 'merge' to add the frequencies back to the data frame. 这是一种使用“表格”对符合条件的行进行计数,并使用“合并”将频率添加回数据帧的方法。
> df<-data.frame(ID=rep(c(1,2),4),GRP=rep(c("a","a","b","b"),2))
> id.frq <- as.data.frame(table(df$ID))
> colnames(id.frq) <- c('ID','ID.FREQ')
> df <- merge(df,id.frq)
> grp.frq <- as.data.frame(table(df$ID,df$GRP))
> colnames(grp.frq) <- c('ID','GRP','GRP.FREQ')
> df <- merge(df,grp.frq)
> df
ID GRP ID.FREQ GRP.FREQ
1 1 a 4 2
2 1 a 4 2
3 1 b 4 2
4 1 b 4 2
5 2 a 4 2
6 2 a 4 2
7 2 b 4 2
8 2 b 4 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.