簡體   English   中英

如何使用 dplyr 在另一個(分組)列上匯總多個列的條件?

[英]How to summarize across multiple columns with condition on another (grouped) column with dplyr?

我需要以通用方式summarize跨多個列的 data.frame:

  • 第一個summarize操作很簡單,例如一個簡單的中位數,而且很簡單;
  • 然后第二個summarize在另一列中包含一個條件,例如在另一列中取最小值(按組)的值:
set.seed(4)

myDF = data.frame(i = rep(1:3, each=3),
                  j = rnorm(9),
                  a = sample.int(9),
                  b = sample.int(9),
                  c = sample.int(9),
                  d = 'foo')
#   i          j a b c   d
# 1 1  0.2167549 4 5 5 foo
# 2 1 -0.5424926 7 7 4 foo
# 3 1  0.8911446 3 9 1 foo
# 4 2  0.5959806 8 6 8 foo
# 5 2  1.6356180 6 8 3 foo
# 6 2  0.6892754 1 4 6 foo
# 7 3 -1.2812466 9 1 7 foo
# 8 3 -0.2131445 5 2 2 foo
# 9 3  1.8965399 2 3 9 foo

myDF %>% group_by(i) %>% summarize(across(where(is.numeric), median, .names="med_{col}"),
                                   best_a = a[[which.min(j)]],
                                   best_b = b[[which.min(j)]],
                                   best_c = c[[which.min(j)]])
# # A tibble: 3 x 8
#      i   med_j med_a med_b med_c best_a best_b best_c
# * <int>   <dbl> <int> <int> <int>  <int>  <int>  <int>
# 1     1  0.217     4     7     4      7      7      4
# 2     2  0.689     6     6     6      8      6      8
# 3     3 -0.213     5     2     7      9      1      7

如何以通用方式定義第二個summarize操作(即,不是像上面那樣手動完成)?

因此我需要這樣的東西(這顯然不起作用,因為j無法識別):

myfns = list(med = ~median(.),
             best = ~.[[which.min(j)]])
myDF %>% group_by(i) %>% summarize(across(where(is.numeric), myfns, .names="{fn}_{col}"))
# Error: Problem with `summarise()` input `..1`.
# x object 'j' not found
# ℹ Input `..1` is `across(where(is.numeric), myfns, .names = "{fn}_{col}")`.
# ℹ The error occurred in group 1: i = 1.

使用另一個交叉來獲取列a:c across的相應值,其中j是最小值。

library(dplyr)

myDF %>% 
  group_by(i) %>% 
  summarize(across(where(is.numeric), median, .names="med_{col}"),
            across(a:c,  ~.[which.min(j)],.names = 'best_{col}'))

#      i  med_j med_a med_b med_c best_a best_b best_c
#* <int>  <dbl> <int> <int> <int>  <int>  <int>  <int>
#1     1  0.217     4     7     4      7      7      4
#2     2  0.689     6     6     6      8      6      8
#3     3 -0.213     5     2     7      9      1      7

要在相同across語句中執行此操作:

myDF %>% 
  group_by(i) %>% 
  summarize(across(where(is.numeric), list(med = median, 
                                           best = ~.[which.min(j)]), 
                                      .names="{fn}_{col}"))

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM