简体   繁体   中英

Passing column names as both variables and columns in a single dplyr function in R

I am writing a code in which a column name (eg "Category" ) is supplied by the user and assigned to a variable biz.area . For example...

biz.area <- "Category"

The original data frame is saved as risk.data . User also supplies the range of columns to analyze by providing column names for variables first.column and last.column .

Text in these columns will be broken up into bigrams for further text analysis including tf_idf.

My code for this analysis is given below.

x.bigrams <- risk.data %>% 
  gather(fields, alldata, first.column:last.column) %>% 
  unnest_tokens(bigrams,alldata,token = "ngrams", n=2) %>% 
  count(bigrams, biz.area, sort=TRUE) %>%
  bind_tf_idf(bigrams, biz.area, n) %>%
  arrange(desc(tf_idf))

However, I get the following error.

Error in grouped_df_impl(data, unname(vars), drop) : Column x.biz.area is unknown

This is because count() expects a column name text string instead of variable biz.area . If I use count_() instead, I get the following error.

Error in compat_lazy_dots(vars, caller_env()) : object 'bigrams' not found

This is because count_() expects to find only variables and bigrams is not a variable.

How can I pass both a constant and a variable to count() or count_() ?

Thanks for your suggestion!

It looks to me like you need to enclosures, so that you can pass column names as variables, rather than as strings or values. Since you're already using dplyr, you can use dplyr's non-standard evaluation techniques .

Try something along these lines:

library(tidyverse)

analyze_risk  <- function(area, firstcol, lastcol) {

    # turn your arguments into enclosures
    areaq  <- enquo(area)
    firstcolq <- enquo(firstcol)
    lastcolq <- enquo(lastcol)

    # run your analysis on the risk data
    risk.data %>% 
      gather(fields, alldata, !!firstcolq:!!lastcolq) %>% 
      unnest_tokens(bigrams,alldata,token = "ngrams", n=2) %>% 
      count(bigrams, !!areaq, sort=TRUE) %>%
      bind_tf_idf(bigrams, !!areaq, n) %>%
      arrange(desc(tf_idf))
}

In this case, your users would pass bare column names into the function like this:

myresults  <- analyze_risk(Category, Name_of_Firstcol, Name_of_Lastcol)

If you want users to pass in strings, you'll need to use rlang::expr() instead of enquo() .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM