![](/img/trans.png)
[英]Trying to assign groups in R, but it is filling in NA values and missing others that belong in the group
[英]How do I group by in R where some rows (that are subsets of others) belong to multiple groups?
这是我的数据。 我希望按日期,ID1和ID2对行进行分组。 ID3中的行被分组为它们的子集,即第一和第二ID匹配的所有行。 另外,要累加统计信息并生成n()。
date ID1 ID2 ID3 stat1 stat2 stat3
1 12-03-07 abc123 wxy456 pqr123 10 20 30
2 12-03-07 abc123 wxy456 pqr123 20 40 60
3 10-04-07 bcd456 wxy456 hgf356 10 20 40
4 12-03-07 abc123 wxy456 hfz123 30 60 90
5 12-03-07 abc123 wxy456 <NA> 40 50 70
期望的输出
date ID1, ID2, ID3, n, stat1, stat2, stat3
12-03-07 abc123, wxy456, pqr123, 3, 70, 110, 160
10-04-07 bcd456, wxy456, hgf356, 1, 10, 20, 40
12-03-07 abc123, wxy456, hfz123, 2 , 40, 50, 70
可能是更优雅的解决方案,但使用dplyr的groupby / summarise(如Adam Quek的代码)解决了这一问题,方法是加入并计算平均值。
# Summarize
df <- df %>% group_by(date, ID1, ID2, ID3) %>% summarise(n=n(), stat1=sum(stat1), stat2=sum(stat2), stat3=sum(stat3)
# Select instances where NA
dfNA <- df %>% filter(is.na(ID3))
# Select instances where no NA
df1 <- df %>% filter(!is.na(ID3))
# Join these
dfBig <- df1 %>% full_join(dfNA, by = c("date", "ID1")) %>%
subset(select= c("ID1", "date", "n.x", "n.y", "stat1.x", "stat1.y", "stat2.x", "stat2.y", "stat3.x", "stat3.y"))
# Replace <NA>s by 0
dfBig$stat1.x[is.na(dfBig$stat1.x)] <- 0
dfBig$stat1.y[is.na(dfBig$stat1.y)] <- 0
dfBig$stat2.x[is.na(dfBig$stat1.x)] <- 0
dfBig$stat2.y[is.na(dfBig$stat1.y)] <- 0
dfBig$stat3.x[is.na(dfBig$stat1.x)] <- 0
dfBig$stat3.y[is.na(dfBig$stat1.y)] <- 0
dfBig$n.x[is.na(dfBig$n.x)] <- 0
dfBig$n.y[is.na(dfBig$n.y)] <- 0
# Compute Mean stats and Rename Columns
dfBig$stat1Mean <- (dfBig$stat1.x * dfBig$n.x + dfBig$stat1.y * dfBig$n.y) / (dfBig$n.x +dfBig$n.y)
dfBig$stat2Mean <- (dfBig$stat1.x * dfBig$n.x + dfBig$stat1.y * dfBig$n.y) / (dfBig$n.x +dfBig$n.y)
dfBig$stat3Mean <- (dfBig$stat1.x * dfBig$n.x + dfBig$stat1.y * dfBig$n.y) / (dfBig$n.x +dfBig$n.y)
dfBig$n2 <- dfBig$n.x + dfBig$n.y
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.