简体   繁体   English

汇总2组

[英]aggregate over 2 groups

I'm trying to understand how to aggregate my output. 我试图了解如何汇总我的输出。 I've created some dummy data which approximates my actual data, which is: hundreds of group1, 3 levels of group2, and several dozen validation logicals. 我已经创建了一些虚拟数据,它们近似于我的实际数据,即:数百个group1、3个级别的group2和数十个验证逻辑。 Apologies if this seems simple, I've hunted and pecked alot, and have to say that as a newbie to R, the huge variety of tools (the apply family, ddply, aggregate, table, reshape, etc) out there is both wonderful and a bit scary:) 道歉,如果这看起来很简单,我已经狩猎并啄了很多东西,不得不说,作为R的新手,各种各样的工具(apply family,ddply,aggregate,table,reshape等)都很棒有点吓人:)

 #create data
group1 <- paste("Group", rep(LETTERS[1:7], sep=''))
group2 <- c("UNC", "UNC", "SS", "LS", "LS", "SS", "UNC")
valid1 <- c("Y", "N", NA, "N", "Y", "Y", "N")
valid2 <- c("N", "N", "Y", "N", "N", "Y", "N")
valid3 <- c(1.4, 1.2, NA, 0.7, 0.3, NA, 1.7)
valid4 <- c(0.4, 0.3, 0.53, 0.66, 0.3, 0.3, 0.71)
valid5 <- c(8.5, 11.2,NA, NA, 8.3, NA, 11.7)

testdata <- data.frame(cbind(group, group2, valid1, valid2, valid3, valid4, valid5))

valid <- function(testdata){
  for(i in group)
    val1 <- ifelse(valid1=="Y", 1,0)
     val2 <- ifelse(valid2=="Y", 1,0)
      val3 <- ifelse(valid3>=1.0, 1,0)
      val4 <- ifelse(valid4<=0.5, 1,0)
       val5 <- ifelse(valid5>=10.0, 1,0)

  test.out <- data.frame(cbind(group1,group2, val1, val2, val3, val4, val5))

}
validtry <- valid(testdata)'

Then, I need to turn these logicals into numeric so they can be summed: 然后,我需要将这些逻辑转换为数字,以便可以对其进行求和:

#make validations numeric
# why doesn't this work:
# validtry[,3:7] <- as.numeric(validtry[,3:7])
#but these do
validtry[,3] <- as.numeric(validtry[,3])
validtry[,4] <- as.numeric(validtry[,4])
validtry[,5] <- as.numeric(validtry[,5])
validtry[,6] <- as.numeric(validtry[,6])
validtry[,7] <- as.numeric(validtry[,7])
######

#summarize validtry
#sum on both groups
aggregate(validtry[,3:7], by=list(validtry$group1, validtry$group2), sum, na.rm=T)

#sum on one group
aggregate(validtry[,3:7], by=list(validtry$group2), sum, na.rm=T)

So, these last two get me close, but I think I need something different? 所以,这最后两个使我接近,但是我认为我需要其他东西吗? I trying to sum across both rows and columns for the two groups. 我试图总结两个组的行和列。 I'm familiar with tapply, but that doesn't seem to get it. 我对tapply很熟悉,但这似乎无法实现。

thanks in advance!! 提前致谢!!

It is not clear about the expected output. 目前尚不清楚预期的输出。 My guess is: 我的猜测是:

 testdata <- data.frame(group1, group2, valid1, valid2, valid3, valid4, valid5)
 str1 <- c("valid1=='Y'", "valid2=='Y'", "valid3>=1.0", "valid4 <=0.5", "valid5>=10.0")
 validtry <- testdata

 #Though I used eval(parse(...)), it is not that recommended 
 validtry[,-(1:2)] <- lapply(str1, function(x) 1*with(testdata, eval(parse(text=x))))

 library(reshape2) 
 lst <-  lapply(validtry[3:7], function(x)
       dcast(data.frame(validtry[1:2], x), group1~group2, value.var="x", sum, na.rm=TRUE))

 lst[[1]]
 #   group1 LS SS UNC
 #1 Group A  0  0   1
 #2 Group B  0  0   0
 #3 Group C  0  0   0
 #4 Group D  0  0   0
 #5 Group E  1  0   0
 #6 Group F  0  1   0
 #7 Group G  0  0   0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM