簡體   English   中英

如何在 dplyr r 中進行條件分組和匯總

[英]how to have conditional grouping and summarising in dplyr r

如何根據將被視為狀態列的 Sum 列的最大值在 r 中組合如下數據框中的行,同時將其他行匯總為 sum。

所以對於這樣的輸入:

    score1  score2  score3  sum Status
John    1   1   0            2  A
John    0   3   0            3  B
Smith   0   1   3            4  A
Sean    1   2   1            4  A
Sean    1   0   2            3  B
Sean    5   1   1            7  C
Carl    0   1   1            2  A

我希望有這個輸出:

Name    score1  score2  score3  sum Status
John    1   4   0   5   B
Smith   0   1   3   4   A
Sean    7   3   4   14  C
Carl    0   1   1   2   A

我們可以計算sum並獲得每個Name的 max sum對應的Status

library(dplyr)
df %>%
 group_by(Name) %>%
 summarise(Sum = sum(sum), Status = Status[which.max(sum)])

#  Name    Sum Status
#  <fct> <int> <fct> 
#1 Carl      2 A     
#2 John      5 B     
#3 Sean     14 C     
#4 Smith     4 A     

或者對data.table使用相同的邏輯

library(data.table)
setDT(df)[, .(Sum = sum(sum), Status = Status[which.max(sum)]), Name]

數據

df <- structure(list(Name = structure(c(2L, 2L, 4L, 3L, 3L, 3L, 1L), 
.Label = c("Carl","John", "Sean", "Smith"), class = "factor"), score1 = c(1L, 0L, 
0L, 1L, 1L, 5L, 0L), score2 = c(1L, 3L, 1L, 2L, 0L, 1L, 1L), 
score3 = c(0L, 0L, 3L, 1L, 2L, 1L, 1L), sum = c(2L, 3L, 4L, 
4L, 3L, 7L, 2L), Status = structure(c(1L, 2L, 1L, 1L, 2L, 
3L, 1L), .Label = c("A", "B", "C"), class = "factor")), class = "data.frame", 
row.names = c(NA, -7L))

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM