簡體   English   中英

使用 dplyr 分組時如何計算均值、最小值和最大值?

[英]How to calculate mean , min, and max across when grouping using dplyr?

所以我有一個數據框,簡化為:

ID A B C
1  1 5 0 
2  3 0 3
3  0 2 1
2  5 9 1 
3  3 5 3
1  2 6 4

簡單地說,我想為每一行計算以下內容:

  • 意思
  • 中位數
  • 最大限度
  • 最小

很容易,但對我來說困難的部分是在每個 ID 之后,我如何創建一個平均值來表示每個 ID。

因此,在獲得這些值后,如何顯示每個 ID 的平均平均值/中/最大值/最小值???

預期輸出:

(1)

ID  Mean Median Min Max
1      2      1   0   5
2      2      3   0   3
3      1      1   0   2
2      5      5   1   9
3   3.66      3   3   5
1      4      4   2   6   

(2)

ID  AvgMean AvgMedian AvgMin AvgMax
1         3       2.5      1    5.5  
2       3.5         4      1      6 
3      2.33         3      3    3.5

你可以嘗試這樣的事情:

   library(dplyr)
   df %>% 
   group_by(ID) %>%
   summarise(mean_ = mean(c_across(A:C), na.rm = T),
             medi_ = median(c_across(A:C), na.rm = T),
             max_  = max(c_across(A:C), na.rm = T),
             min_  = min(c_across(A:C), na.rm = T))
    
    `summarise()` ungrouping output (override with `.groups` argument)
    # A tibble: 3 x 5
         ID mean_ medi_  max_  min_
      <int> <dbl> <dbl> <int> <int>
    1     1  3      3       6     0
    2     2  3.5    3       9     0
    3     3  2.33   2.5     5     0

對於第二部分:

df %>% 
   rowwise() %>%
   summarise(mean_ = mean(c_across(A:C), na.rm = T),
             medi_ = median(c_across(A:C), na.rm = T),
             max_  = max(c_across(A:C), na.rm = T),
             min_  = min(c_across(A:C), na.rm = T))

`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 6 x 4
  mean_ medi_  max_  min_
  <dbl> <int> <int> <int>
1  2        1     5     0
2  2        3     3     0
3  1        1     2     0
4  5        5     9     1
5  3.67     3     5     3
6  4        4     6     2

有數據:

df <- structure(list(ID = c(1L, 2L, 3L, 2L, 3L, 1L), A = c(1L, 3L, 
0L, 5L, 3L, 2L), B = c(5L, 0L, 2L, 9L, 5L, 6L), C = c(0L, 3L, 
1L, 1L, 3L, 4L)), class = "data.frame", row.names = c(NA, -6L
)) 

感謝您發布預期的輸出。 我會考慮一起使用匯總和交叉

library(dplyr)

df <- df %>%
group_by(ID)
summarize(across(2:4, mean))

在基礎 R 中,以下內容似乎滿足了問題的要求。

out是未分組的統計數據和分組數據的out2的 data.frame。

fun <- function(X){
  f <- function(x, na.rm = FALSE){
    c(
      Mean = mean(x, na.rm = na.rm),
      Median = median(x, na.rm = na.rm),
      Min = min(x, na.rm = na.rm),
      Max = max(x, na.rm = na.rm)
    )
  }
  t(apply(X, 1, f))
}

out <- lapply(split(df1[-1], df1$ID), fun)
out2 <- lapply(out, colMeans)

out <- do.call(rbind, out)
out <- cbind.data.frame(ID = row.names(out), out)
out2 <- cbind.data.frame(ID = names(out2), do.call(rbind, out2))

out
#  ID     Mean Median Min Max
#1  1 2.000000      1   0   5
#6  6 4.000000      4   2   6
#2  2 2.000000      3   0   3
#4  4 5.000000      5   1   9
#3  3 1.000000      1   0   2
#5  5 3.666667      3   3   5


out2
#  ID     Mean Median Min Max
#1  1 3.000000    2.5 1.0 5.5
#2  2 3.500000    4.0 0.5 6.0
#3  3 2.333333    2.0 1.5 3.5

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM