简体   繁体   中英

how to have conditional grouping and summarising in dplyr r

How can I combine rows in a dataframe as below in r based on the max value of Sum column to be considered as as the status column while summarising other rows as sum.

So for the input as this:

    score1  score2  score3  sum Status
John    1   1   0            2  A
John    0   3   0            3  B
Smith   0   1   3            4  A
Sean    1   2   1            4  A
Sean    1   0   2            3  B
Sean    5   1   1            7  C
Carl    0   1   1            2  A

I expect to have this output:

Name    score1  score2  score3  sum Status
John    1   4   0   5   B
Smith   0   1   3   4   A
Sean    7   3   4   14  C
Carl    0   1   1   2   A

We can calculate the sum and get the corresponding Status of max sum for each Name .

library(dplyr)
df %>%
 group_by(Name) %>%
 summarise(Sum = sum(sum), Status = Status[which.max(sum)])

#  Name    Sum Status
#  <fct> <int> <fct> 
#1 Carl      2 A     
#2 John      5 B     
#3 Sean     14 C     
#4 Smith     4 A     

Or using the same logic with data.table

library(data.table)
setDT(df)[, .(Sum = sum(sum), Status = Status[which.max(sum)]), Name]

data

df <- structure(list(Name = structure(c(2L, 2L, 4L, 3L, 3L, 3L, 1L), 
.Label = c("Carl","John", "Sean", "Smith"), class = "factor"), score1 = c(1L, 0L, 
0L, 1L, 1L, 5L, 0L), score2 = c(1L, 3L, 1L, 2L, 0L, 1L, 1L), 
score3 = c(0L, 0L, 3L, 1L, 2L, 1L, 1L), sum = c(2L, 3L, 4L, 
4L, 3L, 7L, 2L), Status = structure(c(1L, 2L, 1L, 1L, 2L, 
3L, 1L), .Label = c("A", "B", "C"), class = "factor")), class = "data.frame", 
row.names = c(NA, -7L))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM