簡體   English   中英

使用 dplyr summarise() 和 cross() 以及基於膠水的格式不會產生預期的列名

[英]using dplyr summarise() with across() with glue-based formatting does not produce column names as expected

在 R 中,我試圖通過計算多個匯總統計信息來聚合多個列。 我還想在 cross across() function 中使用.names參數,以便結果標題tibble的列名包含有關使用的摘要 function 的信息。

我試過了

library(tidyverse)
library(palmerpenguins)

penguins_stats <- penguins %>% 
  dplyr::group_by(species) %>% 
  dplyr::summarise(across(.cols = ends_with("mm"), 
                          .fns = list(~mean(.x, na.rm = TRUE), 
                                      ~sd(.x, na.rm = TRUE)),
                          .names = "{.col}_{.fn}"))

但是生成的 output 的列名帶有_1_2后綴,而不是我所期望的_mean_sd

names(penguins_stats)
# [1] "species"             "bill_length_mm_1"   
# [3] "bill_length_mm_2"    "bill_depth_mm_1"    
# [5] "bill_depth_mm_2"     "flipper_length_mm_1"
# [7] "flipper_length_mm_2"

Session 信息:

sessionInfo()
# R version 4.0.3 (2020-10-10)
# Platform: x86_64-apple-darwin17.0 (64-bit)
# Running under: macOS Catalina 10.15.7
# 
# Matrix products: default
# BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
# LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
# 
# locale:
# [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
# 
# attached base packages:
# [1] stats     graphics  grDevices utils     datasets  methods  
# [7] base     
# 
# other attached packages:
# [1] palmerpenguins_0.1.0 forcats_0.5.0       
# [3] stringr_1.4.0        dplyr_1.0.2         
# [5] purrr_0.3.4          readr_1.4.0         
# [7] tidyr_1.1.2          tibble_3.0.4        
# [9] ggplot2_3.3.2        tidyverse_1.3.0 

您需要將命名參數傳遞給.fns以在.names中使用{.fn}

library(dplyr)
penguins_stats <- penguins %>% 
  dplyr::group_by(species) %>% 
  dplyr::summarise(across(.cols = ends_with("mm"), 
                          .fns = list(mean = ~mean(.x, na.rm = TRUE), 
                                      sd = ~sd(.x, na.rm = TRUE)),
                          .names = "{.col}_{.fn}"))

names(penguins_stats)
#[1] "species"                "bill_length_mm_mean"    "bill_length_mm_sd"     
#[4] "bill_depth_mm_mean"     "bill_depth_mm_sd"       "flipper_length_mm_mean"
#[7] "flipper_length_mm_sd"  

但是,如果您傳遞一個命名參數,則此處根本不需要.names

penguins_stats <- penguins %>% 
  dplyr::group_by(species) %>% 
  dplyr::summarise(across(.cols = ends_with("mm"), 
                          .fns = list(mean = ~mean(.x, na.rm = TRUE), 
                                      sd = ~sd(.x, na.rm = TRUE))))
names(penguins_stats)

#[1] "species"                "bill_length_mm_mean"    "bill_length_mm_sd"     
#[4] "bill_depth_mm_mean"     "bill_depth_mm_sd"       "flipper_length_mm_mean"
#[7] "flipper_length_mm_sd"  

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM