简体   繁体   中英

R - accessing dataframe column names passed as strings in function argument

with data like below

text = "
date,a,b
12/2/2019,18.1,0.017741935
12/2/2019,18.2,0.020967742
12/9/2019,16.7,0.020322581
12/9/2019,16.9,0.019677419
12/3/2019,18.1,0.017741935
12/3/2019,18.8,0.020967742
12/10/2019,16.2,0.020322581
12/10/2019,16.1,0.019677419
"
df1 = read.table(textConnection(text), sep=",", header = T)

Need to run a similar operation on multiple similar dataframes but with differed column names, so a function makes sense. The function does a scatter plot of two variables using dplyr and ggplot like below.

dplyrGgFn = function(df, colNameX, colNameY) {

  # get average Y value for each x value point to be used
  df = df %>%
    select(colNameX, colNameY) %>%
    mutate(colNameX = round(colNameX,0)) %>%
    group_by(colNameX) %>%
    summarise(colNameY = mean(colNameY))

  # 
  return(
    ggplot(df, aes_string(x=colNameX, y=colNameY)) + 
      geom_point(aes(color = "blue"))
  )

}

And call like dplyrGgFn(df1, "a", "b")

Obviously this function throws error and as you may see the problem is with accessing the column name variables passed as strings in the function call.

Error in round(colNameX, 0) : 
  non-numeric argument to mathematical function 

what is the recommended approach to handle strings passed as arguments for column names? looking for generic answer as it could be applicable for a multiple cases.

Update:

user @Onyambu commented to have a non-function version as starting point - adding that.

df1 = df1 %>%
    select(a, b) %>%
    mutate(a = round(a,0)) %>%
    group_by(a) %>%
    summarise(b = mean(b))

ggplot(df1, aes(x=a, y=b)) + 
  geom_point(aes(color = "blue"))

I changed the group_by(y) to group_by(x), it seems what you intended to do. Otherwise, it is not clear (as also mentioned in a comment).

The following code should help you to understand how to pass variable names dplyr code within functions.

dplyrGgFn = function(df, colNameX, colNameY) {

  # get average Y value for each x value point to be used
  df = df %>%
    select(!!colNameX, !!colNameY) %>%
    mutate(!!colNameX := round(!!as.name(colNameX), 0)) %>%
    group_by(!!as.name(colNameX)) %>%
    summarise(!!colNameY := mean(!!as.name(colNameY)))
  # 
  return(
    ggplot(df, aes_string(x=colNameX, y=colNameY)) + 
      geom_point(aes(color = "blue"))
  )
}

Looking at your code it is not clear what you are trying to do but here is something which might help if you want to pass quoted values in the function.

library(dplyr)
library(rlang)
library(ggplot2)

dplyrGgFn = function(df, colNameX, colNameY) {
    x_col <- sym(colNameX)
    y_col <- sym(colNameY)
    df %>%
      group_by(!!x_col) %>%
      summarise(colNameY = mean(!!y_col)) %>%
      ggplot() + aes(!!x_col, y= !!y_col) + geom_point()
}

dplyrGgFn(df1, "a", "b")

Note that aes_string has been deprecated in favor of sym and aes(color = "blue") doesn't do what you expect it to do.


To pass unquoted variables use {{}} .

dplyrGgFn = function(df, colNameX, colNameY) {
  df %>%
    group_by({{colNameX}}) %>%
    summarise(colNameY = mean({{colNameY}})) %>%
    ggplot() + aes({{colNameX}}, y= {{colNameY}}) + geom_point()
}

dplyrGgFn(df1, a, b)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM