[英]Changing names of resulting variables in custom dplyr function
為了加速在多個表中生成分組摘要; 正如我在dplyr
工作流程中dplyr
,我已經起草了一個生成所需指標的簡單函數
# Function to generate summary table
generate_summary_tbl <- function(dataset, group_column, summary_column) {
group_column <- enquo(group_column)
summary_column <- enquo(summary_column)
dataset %>%
group_by(!!group_column) %>%
summarise(
mean = mean(!!summary_column),
sum = sum(!!summary_column)
# Other metrics that need to be generated frequently
) %>%
ungroup -> smryDta
return(smryDta)
}
該功能按預期工作:
>> mtcars %>%
... generate_summary_tbl(group_column = am, summary_column = mpg)
# A tibble: 2 x 3
am mean sum
<dbl> <dbl> <dbl>
1 0 17.14737 325.8
2 1 24.39231 317.1
我想, 有條件地包括在結果中通過summary_column = mpg
傳遞的列的名稱。
useColName = TRUE
當使用useColName = TRUE
調用時,結果應對應於:
>> mtcars %>%
... generate_summary_tbl(group_column = am, summary_column = mpg,
useColName = TRUE)
# A tibble: 2 x 3
am mean_am sum_am
<dbl> <dbl> <dbl>
1 0 17.14737 325.8
2 1 24.39231 317.1
不同之處在於變量名稱mean_am
存在_am
后綴,依此類推。
我使用setNames
部分,丑陋的解決方案:
# Function to generate summary table
generate_summary_tbl <-
function(dataset,
group_column,
summary_column,
useColName = TRUE) {
group_column <- enquo(group_column)
summary_column <- enquo(summary_column)
dataset %>%
group_by(!!group_column) %>%
summarise(mean = mean(!!summary_column),
sum = sum(!!summary_column)) %>%
ungroup -> smryDta
if (useColName) {
setNames(smryDta,
c(deparse(substitute(
group_column
)),
paste(
names(smryDta)[2:length(smryDta)], paste0("_", deparse(substitute(
group_column
)))
))) -> smryDta
}
return(smryDta)
}
返回的列名稱幾乎與所需結果匹配。 我估計我可以使用一些正則表達式並達到預期的結果。 但是,我認為應該提供更有效的解決方案。
mtcars %>%
generate_summary_tbl(group_column = am, summary_column = mpg, useColName = TRUE)
# A tibble: 2 x 3
`~am` `mean _~am` `sum _~am`
<dbl> <dbl> <dbl>
1 0 17.14737 325.8
2 1 24.39231 317.1
也許使用rename
:
library(tidyverse)
generate_summary_tbl <- function(dataset, group_column, summary_column, useColname = FALSE) {
group_column <- enquo(group_column)
summary_column <- enquo(summary_column)
dataset %>%
group_by(!!group_column) %>%
summarise(
mean = mean(!!summary_column),
sum = sum(!!summary_column)
# Other metrics that need to be generated frequently
) %>%
ungroup -> smryDta
if (useColname)
smryDta <- smryDta %>%
rename_at(
vars(-one_of(quo_name(group_column))),
~paste(quo_name(group_column), .x, sep="_")
)
return(smryDta)
}
mtcars %>% generate_summary_tbl(am, mpg)
# # A tibble: 2 x 3
# am mean sum
# <dbl> <dbl> <dbl>
# 1 0 17.14737 325.8
# 2 1 24.39231 317.1
mtcars %>% generate_summary_tbl(am, mpg, T)
# # A tibble: 2 x 3
# am am_mean am_sum
# <dbl> <dbl> <dbl>
# 1 0 17.14737 325.8
# 2 1 24.39231 317.1
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.