![](/img/trans.png)
[英]compute row-wise summary statistics such as mean, max, min across columns sharing similar names using dplyr
[英]How to calculate mean , min, and max across when grouping using dplyr?
所以我有一個數據框,簡化為:
ID A B C
1 1 5 0
2 3 0 3
3 0 2 1
2 5 9 1
3 3 5 3
1 2 6 4
簡單地說,我想為每一行計算以下內容:
很容易,但對我來說困難的部分是在每個 ID 之后,我如何創建一個平均值來表示每個 ID。
因此,在獲得這些值后,如何顯示每個 ID 的平均平均值/中/最大值/最小值???
預期輸出:
(1)
ID Mean Median Min Max
1 2 1 0 5
2 2 3 0 3
3 1 1 0 2
2 5 5 1 9
3 3.66 3 3 5
1 4 4 2 6
(2)
ID AvgMean AvgMedian AvgMin AvgMax
1 3 2.5 1 5.5
2 3.5 4 1 6
3 2.33 3 3 3.5
你可以嘗試這樣的事情:
library(dplyr)
df %>%
group_by(ID) %>%
summarise(mean_ = mean(c_across(A:C), na.rm = T),
medi_ = median(c_across(A:C), na.rm = T),
max_ = max(c_across(A:C), na.rm = T),
min_ = min(c_across(A:C), na.rm = T))
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 3 x 5
ID mean_ medi_ max_ min_
<int> <dbl> <dbl> <int> <int>
1 1 3 3 6 0
2 2 3.5 3 9 0
3 3 2.33 2.5 5 0
對於第二部分:
df %>%
rowwise() %>%
summarise(mean_ = mean(c_across(A:C), na.rm = T),
medi_ = median(c_across(A:C), na.rm = T),
max_ = max(c_across(A:C), na.rm = T),
min_ = min(c_across(A:C), na.rm = T))
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 6 x 4
mean_ medi_ max_ min_
<dbl> <int> <int> <int>
1 2 1 5 0
2 2 3 3 0
3 1 1 2 0
4 5 5 9 1
5 3.67 3 5 3
6 4 4 6 2
有數據:
df <- structure(list(ID = c(1L, 2L, 3L, 2L, 3L, 1L), A = c(1L, 3L,
0L, 5L, 3L, 2L), B = c(5L, 0L, 2L, 9L, 5L, 6L), C = c(0L, 3L,
1L, 1L, 3L, 4L)), class = "data.frame", row.names = c(NA, -6L
))
感謝您發布預期的輸出。 我會考慮一起使用匯總和交叉
library(dplyr)
df <- df %>%
group_by(ID)
summarize(across(2:4, mean))
在基礎 R 中,以下內容似乎滿足了問題的要求。
out
是未分組的統計數據和分組數據的out2
的 data.frame。
fun <- function(X){
f <- function(x, na.rm = FALSE){
c(
Mean = mean(x, na.rm = na.rm),
Median = median(x, na.rm = na.rm),
Min = min(x, na.rm = na.rm),
Max = max(x, na.rm = na.rm)
)
}
t(apply(X, 1, f))
}
out <- lapply(split(df1[-1], df1$ID), fun)
out2 <- lapply(out, colMeans)
out <- do.call(rbind, out)
out <- cbind.data.frame(ID = row.names(out), out)
out2 <- cbind.data.frame(ID = names(out2), do.call(rbind, out2))
out
# ID Mean Median Min Max
#1 1 2.000000 1 0 5
#6 6 4.000000 4 2 6
#2 2 2.000000 3 0 3
#4 4 5.000000 5 1 9
#3 3 1.000000 1 0 2
#5 5 3.666667 3 3 5
out2
# ID Mean Median Min Max
#1 1 3.000000 2.5 1.0 5.5
#2 2 3.500000 4.0 0.5 6.0
#3 3 2.333333 2.0 1.5 3.5
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.