我如何在R中某些行（属于其他子集）属于多个组的情况下进行分组？

Question

这是我的数据。 我希望按日期，ID1和ID2对行进行分组。 ID3中的行被分组为它们的子集，即第一和第二ID匹配的所有行。 另外，要累加统计信息并生成n（）。

      date     ID1     ID2     ID3 stat1 stat2 stat3
1 12-03-07  abc123  wxy456  pqr123    10    20     30  
2 12-03-07  abc123  wxy456  pqr123    20    40     60  
3 10-04-07  bcd456  wxy456  hgf356    10    20     40  
4 12-03-07  abc123  wxy456  hfz123    30    60     90  
5 12-03-07  abc123  wxy456   <NA>     40    50     70

期望的输出

date      ID1,     ID2,   ID3,   n, stat1, stat2, stat3  
12-03-07 abc123, wxy456, pqr123, 3,   70,   110, 160  
10-04-07 bcd456, wxy456, hgf356, 1,   10,   20,  40  
12-03-07 abc123, wxy456, hfz123, 2 ,  40,   50,  70

Answer 1

可能是更优雅的解决方案，但使用dplyr的groupby / summarise（如Adam Quek的代码）解决了这一问题，方法是加入并计算平均值。

        # Summarize 
        df <- df %>% group_by(date, ID1, ID2, ID3) %>% summarise(n=n(), stat1=sum(stat1), stat2=sum(stat2), stat3=sum(stat3)

        # Select instances where NA
        dfNA <- df %>% filter(is.na(ID3))

        # Select instances where no NA
        df1 <- df %>% filter(!is.na(ID3))

        # Join these
        dfBig <- df1 %>% full_join(dfNA, by = c("date", "ID1")) %>% 
          subset(select= c("ID1", "date", "n.x", "n.y", "stat1.x", "stat1.y", "stat2.x", "stat2.y", "stat3.x", "stat3.y"))

        # Replace <NA>s by 0
        dfBig$stat1.x[is.na(dfBig$stat1.x)] <- 0
        dfBig$stat1.y[is.na(dfBig$stat1.y)] <- 0
        dfBig$stat2.x[is.na(dfBig$stat1.x)] <- 0
        dfBig$stat2.y[is.na(dfBig$stat1.y)] <- 0
        dfBig$stat3.x[is.na(dfBig$stat1.x)] <- 0
        dfBig$stat3.y[is.na(dfBig$stat1.y)] <- 0
        dfBig$n.x[is.na(dfBig$n.x)] <- 0
        dfBig$n.y[is.na(dfBig$n.y)] <- 0

    # Compute Mean stats and Rename Columns
        dfBig$stat1Mean <- (dfBig$stat1.x * dfBig$n.x + dfBig$stat1.y * dfBig$n.y) / (dfBig$n.x +dfBig$n.y)
        dfBig$stat2Mean <- (dfBig$stat1.x * dfBig$n.x + dfBig$stat1.y * dfBig$n.y) / (dfBig$n.x +dfBig$n.y)
        dfBig$stat3Mean <- (dfBig$stat1.x * dfBig$n.x + dfBig$stat1.y * dfBig$n.y) / (dfBig$n.x +dfBig$n.y)
        dfBig$n2 <- dfBig$n.x + dfBig$n.y

我如何在R中某些行（属于其他子集）属于多个组的情况下进行分组？

问题描述

1 个解决方案

解决方案1
0 已采纳 2017-07-13 22:19:11

我如何在R中某些行（属于其他子集）属于多个组的情况下进行分组？

问题描述

1 个解决方案

解决方案1 0 已采纳 2017-07-13 22:19:11

解决方案1
0 已采纳 2017-07-13 22:19:11