[英]How to pass column names into a function dplyr
I'm trying to create a simple summary function to speed up the reporting of multiple columns of data for use in a R Markdown file. 我正在尝试创建一个简单的汇总函数来加速报告多列数据以用于R Markdown文件。
var1 is a categorical column of data, t_var is an integer representing the quarter of data, and dt is the full data. var1是数据的分类列,t_var是表示数据四分之一的整数,dt是完整数据。
summarise_data_categorical <- function(var1, t_var, dt){
print(var1)
print(t_var)
#Select the columns to aggregate
group_func <- dt %>%
select(one_of(t_var, var1)) %>%
group_by(t_var,var1)
#create simple count summary
count_table <- group_func %>%
summarise(count = n()) %>%
spread(t_var, count)
#create a frequency version of the same table...
freq <- dt %>%
select(t_var, var1) %>%
group_by(t_var,var1) %>%
summarise(count = n()) %>%
mutate(freq = round(count / sum(count),3)*100) %>%
select(-count)
#Present that table
freq_table <- freq %>%
spread(t_var, freq)
#Create the chart to do the same thing..
freq_chart <- freq %>%
ggplot()+
geom_line(mapping=aes(x=t_var, y = freq, colour=var1))
#Compile outputs as a list
results <- list(count_table, freq_table, freq_chart)
#Return list
results
}
Say I've got a frame: 说我有一个框架:
fr <- data.frame(lets = sample(LETTERS, 100, replace=TRUE),
`quarter type` = sample(1:4, 100, replace=TRUE))
If I run the function, thus: 如果我运行该函数,那么:
summarise_data_categorical("lets", "quarter type", fr)
The initial output is promising: 初始产出很有希望:
[1] "lets"
[1] "quarter type"
(NOTE: in trying to recreate the data, for some reason I also receive the warning: (注意:在尝试重新创建数据时,出于某种原因我也会收到警告:
Unknown variables: quarter type
, Although this doesn't appear in my original data) 未知变量: quarter type
,虽然这不会出现在我的原始数据中)
The main thing is I get an error: 主要是我得到一个错误:
Error in resolve_vars(new_groups, tbl_vars(.data)) : unknown variable to group by : t_var
Having come from Python, I'm still a bit confused on how to refer to columns. 来自Python,我仍然对如何引用列感到困惑。 Can someone explain how I can fix what I've got wrong? 有人可以解释我如何解决我的错误吗?
We can use the new quosures from the devel version of dplyr
(soon to be released in 0.6.0) 我们可以使用dep版本的dplyr
的新版本(很快将在0.6.0中发布)
summarise_data_categorical <- function(var1, t_var, dt){
var1 <- enquo(var1)
t_var <- enquo(t_var)
v1 <- quo_name(var1)
v2 <- quo_name(t_var)
dt %>%
select(one_of(v1, v2)) %>%
group_by(!!t_var, !!var1) %>%
summarise(count = n())
}
summarise_data_categorical(lets, quartertype, fr)
#Source: local data frame [65 x 3]
#Groups: quartertype [?]
# quartertype lets count
# <int> <fctr> <int>
#1 1 A 1
#2 1 F 2
#3 1 G 2
#4 1 H 1
#5 1 I 1
#6 1 J 4
#7 1 M 3
#8 1 N 1
#9 1 P 1
#10 1 S 5
# ... with 55 more rows
The enquo
does a similar functionality as substitute
from base R
by taking the input arguments and convert it to quosures
. 所述enquo
做了相似的功能substitute
从base R
通过取输入参数,并将其转换到quosures
。 The one_of
takes a string argument, so quosures can be converted to string with quo_name
. one_of
采用字符串参数,因此可以使用quo_name
转换为字符串。 Inside the group_by/summarise/mutate
etc, we can evaluate the quosure by unquote ( UQ
or !!
) 在group_by/summarise/mutate
等内部,我们可以通过unquote( UQ
或!!
)来评估quosure
The quosures
seems to be working fine with dplyr
though we have some difficulty in implementing the same with tidyr
functions. 该quosures
似乎是工作的罚款dplyr
虽然我们在实现相同的有一定的难度tidyr
功能。 The following code should work for the full code 以下代码应适用于完整代码
summarise_data_categorical <- function(var1, t_var, dt){
var1 <- enquo(var1)
t_var <- enquo(t_var)
v1 <- quo_name(var1)
v2 <- quo_name(t_var)
Summ_func <- dt %>%
select(one_of(v1, v2)) %>%
group_by(!!t_var, !!var1) %>%
summarise(count = n())
count_table <- Summ_func %>%
spread_(v2, "count")
freq <- Summ_func %>%
mutate(freq = round(count / sum(count),3)*100) %>%
select(-count)
freq_table <- freq %>%
spread_(v2, "freq")
freq_chart <- freq %>%
ggplot()+
geom_line(mapping=aes_string(x= v2 , y = "freq", colour= v1))
results <- list(count_table, freq_table, freq_chart)
results
}
summarise_data_categorical(lets, quartertype, fr)
#[[1]]
# A tibble: 24 × 5
# lets `1` `2` `3` `4`
#* <fctr> <int> <int> <int> <int>
#1 A NA NA 1 2
#2 B 2 NA NA 1
#3 C 1 5 1 2
#4 E 1 1 NA NA
#5 G NA 1 2 2
#6 H 1 NA 1 1
#7 I NA 1 1 2
#8 J 2 1 1 1
#9 K 1 1 2 1
#10 L NA 2 NA NA
# ... with 14 more rows
#[[2]]
# A tibble: 24 × 5
# lets `1` `2` `3` `4`
#* <fctr> <dbl> <dbl> <dbl> <dbl>
#1 A NA NA 3.1 9.5
#2 B 8.7 NA NA 4.8
#3 C 4.3 20.8 3.1 9.5
#4 E 4.3 4.2 NA NA
#5 G NA 4.2 6.2 9.5
#6 H 4.3 NA 3.1 4.8
#7 I NA 4.2 3.1 9.5
#8 J 8.7 4.2 3.1 4.8
#9 K 4.3 4.2 6.2 4.8
#10 L NA 8.3 NA NA
## ... with 14 more rows
#[[3]]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.