[英]how to have conditional grouping and summarising in dplyr r
如何根據將被視為狀態列的 Sum 列的最大值在 r 中組合如下數據框中的行,同時將其他行匯總為 sum。
所以對於這樣的輸入:
score1 score2 score3 sum Status
John 1 1 0 2 A
John 0 3 0 3 B
Smith 0 1 3 4 A
Sean 1 2 1 4 A
Sean 1 0 2 3 B
Sean 5 1 1 7 C
Carl 0 1 1 2 A
我希望有這個輸出:
Name score1 score2 score3 sum Status
John 1 4 0 5 B
Smith 0 1 3 4 A
Sean 7 3 4 14 C
Carl 0 1 1 2 A
我們可以計算sum
並獲得每個Name
的 max sum
對應的Status
。
library(dplyr)
df %>%
group_by(Name) %>%
summarise(Sum = sum(sum), Status = Status[which.max(sum)])
# Name Sum Status
# <fct> <int> <fct>
#1 Carl 2 A
#2 John 5 B
#3 Sean 14 C
#4 Smith 4 A
或者對data.table
使用相同的邏輯
library(data.table)
setDT(df)[, .(Sum = sum(sum), Status = Status[which.max(sum)]), Name]
數據
df <- structure(list(Name = structure(c(2L, 2L, 4L, 3L, 3L, 3L, 1L),
.Label = c("Carl","John", "Sean", "Smith"), class = "factor"), score1 = c(1L, 0L,
0L, 1L, 1L, 5L, 0L), score2 = c(1L, 3L, 1L, 2L, 0L, 1L, 1L),
score3 = c(0L, 0L, 3L, 1L, 2L, 1L, 1L), sum = c(2L, 3L, 4L,
4L, 3L, 7L, 2L), Status = structure(c(1L, 2L, 1L, 1L, 2L,
3L, 1L), .Label = c("A", "B", "C"), class = "factor")), class = "data.frame",
row.names = c(NA, -7L))
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.