简体   繁体   English

R - 访问 dataframe 列名称作为 function 参数中的字符串传递

[英]R - accessing dataframe column names passed as strings in function argument

with data like below数据如下

text = "
date,a,b
12/2/2019,18.1,0.017741935
12/2/2019,18.2,0.020967742
12/9/2019,16.7,0.020322581
12/9/2019,16.9,0.019677419
12/3/2019,18.1,0.017741935
12/3/2019,18.8,0.020967742
12/10/2019,16.2,0.020322581
12/10/2019,16.1,0.019677419
"
df1 = read.table(textConnection(text), sep=",", header = T)

Need to run a similar operation on multiple similar dataframes but with differed column names, so a function makes sense.需要在多个类似的数据帧上运行类似的操作,但列名不同,所以 function 是有意义的。 The function does a scatter plot of two variables using dplyr and ggplot like below. function 使用dplyrggplot对两个变量进行散点 plot ,如下所示。

dplyrGgFn = function(df, colNameX, colNameY) {

  # get average Y value for each x value point to be used
  df = df %>%
    select(colNameX, colNameY) %>%
    mutate(colNameX = round(colNameX,0)) %>%
    group_by(colNameX) %>%
    summarise(colNameY = mean(colNameY))

  # 
  return(
    ggplot(df, aes_string(x=colNameX, y=colNameY)) + 
      geom_point(aes(color = "blue"))
  )

}

And call like dplyrGgFn(df1, "a", "b")并像dplyrGgFn(df1, "a", "b")

Obviously this function throws error and as you may see the problem is with accessing the column name variables passed as strings in the function call.显然,这个 function 会引发错误,您可能会看到问题在于访问在 function 调用中作为字符串传递的列名变量。

Error in round(colNameX, 0) : 
  non-numeric argument to mathematical function 

what is the recommended approach to handle strings passed as arguments for column names?处理作为列名的 arguments 传递的字符串的推荐方法是什么? looking for generic answer as it could be applicable for a multiple cases.寻找通用答案,因为它可能适用于多种情况。

Update:更新:

user @Onyambu commented to have a non-function version as starting point - adding that.用户@Onyambu 评论说有一个非功能版本作为起点 - 添加。

df1 = df1 %>%
    select(a, b) %>%
    mutate(a = round(a,0)) %>%
    group_by(a) %>%
    summarise(b = mean(b))

ggplot(df1, aes(x=a, y=b)) + 
  geom_point(aes(color = "blue"))

I changed the group_by(y) to group_by(x), it seems what you intended to do.我将 group_by(y) 更改为 group_by(x),这似乎是您打算做的。 Otherwise, it is not clear (as also mentioned in a comment).否则,不清楚(如评论中所述)。

The following code should help you to understand how to pass variable names dplyr code within functions.以下代码应该可以帮助您了解如何在函数中传递变量名称dplyr代码。

dplyrGgFn = function(df, colNameX, colNameY) {

  # get average Y value for each x value point to be used
  df = df %>%
    select(!!colNameX, !!colNameY) %>%
    mutate(!!colNameX := round(!!as.name(colNameX), 0)) %>%
    group_by(!!as.name(colNameX)) %>%
    summarise(!!colNameY := mean(!!as.name(colNameY)))
  # 
  return(
    ggplot(df, aes_string(x=colNameX, y=colNameY)) + 
      geom_point(aes(color = "blue"))
  )
}

Looking at your code it is not clear what you are trying to do but here is something which might help if you want to pass quoted values in the function.查看您的代码尚不清楚您要做什么,但如果您想在 function 中传递引用的值,这可能会有所帮助。

library(dplyr)
library(rlang)
library(ggplot2)

dplyrGgFn = function(df, colNameX, colNameY) {
    x_col <- sym(colNameX)
    y_col <- sym(colNameY)
    df %>%
      group_by(!!x_col) %>%
      summarise(colNameY = mean(!!y_col)) %>%
      ggplot() + aes(!!x_col, y= !!y_col) + geom_point()
}

dplyrGgFn(df1, "a", "b")

Note that aes_string has been deprecated in favor of sym and aes(color = "blue") doesn't do what you expect it to do.请注意, aes_string已被弃用,取而代之的是sym ,并且aes(color = "blue")并没有按照您的预期执行。


To pass unquoted variables use {{}} .要传递未引用的变量,请使用{{}}

dplyrGgFn = function(df, colNameX, colNameY) {
  df %>%
    group_by({{colNameX}}) %>%
    summarise(colNameY = mean({{colNameY}})) %>%
    ggplot() + aes({{colNameX}}, y= {{colNameY}}) + geom_point()
}

dplyrGgFn(df1, a, b)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM