简体   繁体   English

R tibble:按 A 列分组,仅保留 B 列和 C 列中的不同值,并在 C 列中汇总值

[英]R tibble: Group by column A, keep only distinct values in column B and C and sum values in column C

I want to group by column A and then sum values in column C for distinct values in columns B and C .我想按A列分组,然后对C列中的值求和BC列中的不同值。 Is it possible to do it inside summarise clause?是否可以在summarise子句中进行? I know that's possible with distinct() function before aggregation.我知道在聚合之前使用distinct()函数是可能的。 What about something like that: Data:这样的事情怎么样: 数据:

df <- tibble(A = c(1,1,1,2,2), B = c('a','b','b','a','a'), C=c(5,10,10,15,15))

My try that doesn't work:我的尝试不起作用:

df %>% 
group_by(A) %>% 
summarise(sumC=sum(distinct(B,C) %>% select(C)))

Desired ouput:期望输出:

A sumC
1 15
2 15

You could use duplicated你可以使用duplicated

df %>%
    group_by(A) %>%
    summarise(sumC = sum(C[!duplicated(B)]))
## A tibble: 2 x 2
#      A  sumC
#  <dbl> <dbl>
#1     1    15
#2     2    15

Or with distinct或者有distinct

df %>%
    group_by(A) %>%
    distinct(B, C) %>%
    summarise(sumC = sum(C))
## A tibble: 2 x 2
#      A  sumC
#  <dbl> <dbl>
#1     1    15
#2     2    15

A different possibility could be:另一种可能是:

df %>%
 group_by(A, B, C) %>%
 slice(1) %>%
 group_by(A) %>%
 summarise(sumC = sum(C))

      A  sumC
  <dbl> <dbl>
1     1    15
2     2    15

Or a twist on @Maurits Evers answer:或者对@Maurits Evers 的回答有所改动:

df %>%
 distinct(A, B, C) %>%
 group_by(A) %>%
 summarise(sumC = sum(C))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM