简体   繁体   English

如何将列名称传递给函数dplyr

[英]How to pass column names into a function dplyr

I'm trying to create a simple summary function to speed up the reporting of multiple columns of data for use in a R Markdown file. 我正在尝试创建一个简单的汇总函数来加速报告多列数据以用于R Markdown文件。

var1 is a categorical column of data, t_var is an integer representing the quarter of data, and dt is the full data. var1是数据的分类列,t_var是表示数据四分之一的整数,dt是完整数据。

summarise_data_categorical <- function(var1, t_var, dt){

  print(var1)
  print(t_var)

  #Select the columns to aggregate
  group_func <- dt %>% 
    select(one_of(t_var, var1)) %>%
    group_by(t_var,var1)

  #create simple count summary
  count_table <- group_func %>%
    summarise(count = n()) %>%
    spread(t_var, count)

  #create a frequency version of the same table...
  freq <- dt %>%
    select(t_var, var1) %>%
    group_by(t_var,var1) %>%
    summarise(count = n()) %>%
    mutate(freq = round(count / sum(count),3)*100) %>%
    select(-count)

  #Present that table
  freq_table <- freq %>%
    spread(t_var, freq)

  #Create the chart to do the same thing..
  freq_chart <- freq %>%
    ggplot()+
    geom_line(mapping=aes(x=t_var, y = freq, colour=var1))

  #Compile outputs as a list
  results <- list(count_table, freq_table, freq_chart)

  #Return list
  results

}

Say I've got a frame: 说我有一个框架:

fr <- data.frame(lets = sample(LETTERS, 100, replace=TRUE),
           `quarter type` = sample(1:4, 100, replace=TRUE))

If I run the function, thus: 如果我运行该函数,那么:

summarise_data_categorical("lets", "quarter type", fr)

The initial output is promising: 初始产出很有希望:

[1] "lets"
[1] "quarter type"

(NOTE: in trying to recreate the data, for some reason I also receive the warning: (注意:在尝试重新创建数据时,出于某种原因我也会收到警告:

Unknown variables: quarter type , Although this doesn't appear in my original data) 未知变量: quarter type ,虽然这不会出现在我的原始数据中)

The main thing is I get an error: 主要是我得到一个错误:

Error in resolve_vars(new_groups, tbl_vars(.data)) : unknown variable to group by : t_var

Having come from Python, I'm still a bit confused on how to refer to columns. 来自Python,我仍然对如何引用列感到困惑。 Can someone explain how I can fix what I've got wrong? 有人可以解释我如何解决我的错误吗?

We can use the new quosures from the devel version of dplyr (soon to be released in 0.6.0) 我们可以使用dep版本的dplyr的新版本(很快将在0.6.0中发布)

summarise_data_categorical <- function(var1, t_var, dt){

  var1 <- enquo(var1)
  t_var <- enquo(t_var)
  v1 <- quo_name(var1)
  v2 <- quo_name(t_var) 

  dt %>%
    select(one_of(v1, v2)) %>%
    group_by(!!t_var, !!var1) %>%
    summarise(count = n()) 

}
summarise_data_categorical(lets, quartertype, fr)
#Source: local data frame [65 x 3]
#Groups: quartertype [?]

#   quartertype   lets count
#         <int> <fctr> <int>
#1            1      A     1
#2            1      F     2
#3            1      G     2
#4            1      H     1
#5            1      I     1
#6            1      J     4
#7            1      M     3
#8            1      N     1
#9            1      P     1
#10           1      S     5
# ... with 55 more rows

The enquo does a similar functionality as substitute from base R by taking the input arguments and convert it to quosures . 所述enquo做了相似的功能substitutebase R通过取输入参数,并将其转换到quosures The one_of takes a string argument, so quosures can be converted to string with quo_name . one_of采用字符串参数,因此可以使用quo_name转换为字符串。 Inside the group_by/summarise/mutate etc, we can evaluate the quosure by unquote ( UQ or !! ) group_by/summarise/mutate等内部,我们可以通过unquote( UQ!! )来评估quosure


The quosures seems to be working fine with dplyr though we have some difficulty in implementing the same with tidyr functions. quosures似乎是工作的罚款dplyr虽然我们在实现相同的有一定的难度tidyr功能。 The following code should work for the full code 以下代码应适用于完整代码

 summarise_data_categorical <- function(var1, t_var, dt){

  var1 <- enquo(var1)
  t_var <- enquo(t_var)

  v1 <- quo_name(var1)
  v2 <- quo_name(t_var) 

  Summ_func <- dt %>%
                    select(one_of(v1, v2)) %>%
                  group_by(!!t_var, !!var1) %>%
                    summarise(count = n())

   count_table <- Summ_func %>%
                  spread_(v2, "count") 

   freq <-  Summ_func %>%
                  mutate(freq = round(count / sum(count),3)*100) %>%
              select(-count)

   freq_table <- freq %>%
                    spread_(v2, "freq")

   freq_chart <- freq %>%
             ggplot()+
               geom_line(mapping=aes_string(x= v2 , y = "freq", colour= v1)) 

   results <- list(count_table, freq_table, freq_chart)
   results

    }
summarise_data_categorical(lets, quartertype, fr)
#[[1]]
# A tibble: 24 × 5
#     lets   `1`   `2`   `3`   `4`
#*  <fctr> <int> <int> <int> <int>
#1       A    NA    NA     1     2
#2       B     2    NA    NA     1
#3       C     1     5     1     2
#4       E     1     1    NA    NA
#5       G    NA     1     2     2
#6       H     1    NA     1     1
#7       I    NA     1     1     2
#8       J     2     1     1     1
#9       K     1     1     2     1
#10      L    NA     2    NA    NA
# ... with 14 more rows

#[[2]]
# A tibble: 24 × 5
#     lets   `1`   `2`   `3`   `4`
#*  <fctr> <dbl> <dbl> <dbl> <dbl>
#1       A    NA    NA   3.1   9.5
#2       B   8.7    NA    NA   4.8
#3       C   4.3  20.8   3.1   9.5
#4       E   4.3   4.2    NA    NA
#5       G    NA   4.2   6.2   9.5
#6       H   4.3    NA   3.1   4.8
#7       I    NA   4.2   3.1   9.5
#8       J   8.7   4.2   3.1   4.8
#9       K   4.3   4.2   6.2   4.8
#10      L    NA   8.3    NA    NA
## ... with 14 more rows

#[[3]]

在此输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM