[英]R tibble: Group by column A, keep only distinct values in column B and C and sum values in column C
I want to group by column A
and then sum values in column C
for distinct values in columns B
and C
.我想按
A
列分组,然后对C
列中的值求和B
和C
列中的不同值。 Is it possible to do it inside summarise
clause?是否可以在
summarise
子句中进行? I know that's possible with distinct()
function before aggregation.我知道在聚合之前使用
distinct()
函数是可能的。 What about something like that: Data:这样的事情怎么样: 数据:
df <- tibble(A = c(1,1,1,2,2), B = c('a','b','b','a','a'), C=c(5,10,10,15,15))
My try that doesn't work:我的尝试不起作用:
df %>%
group_by(A) %>%
summarise(sumC=sum(distinct(B,C) %>% select(C)))
Desired ouput:期望输出:
A sumC
1 15
2 15
You could use duplicated
你可以使用
duplicated
df %>%
group_by(A) %>%
summarise(sumC = sum(C[!duplicated(B)]))
## A tibble: 2 x 2
# A sumC
# <dbl> <dbl>
#1 1 15
#2 2 15
Or with distinct
或者有
distinct
df %>%
group_by(A) %>%
distinct(B, C) %>%
summarise(sumC = sum(C))
## A tibble: 2 x 2
# A sumC
# <dbl> <dbl>
#1 1 15
#2 2 15
A different possibility could be:另一种可能是:
df %>%
group_by(A, B, C) %>%
slice(1) %>%
group_by(A) %>%
summarise(sumC = sum(C))
A sumC
<dbl> <dbl>
1 1 15
2 2 15
Or a twist on @Maurits Evers answer:或者对@Maurits Evers 的回答有所改动:
df %>%
distinct(A, B, C) %>%
group_by(A) %>%
summarise(sumC = sum(C))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.