简体   繁体   English

如何引用 function 中的列

[英]How to refer to a column in a function

I was trying to write a function that contains two actions, and the first one includes subsetting a dataframe.我试图编写一个包含两个动作的 function,第一个包括子集 dataframe。

Let's say I have these two dataframes.假设我有这两个数据框。

      ID   Var1
   1   5     3
   2   6     1



      ID   Var2
   1   5     9
   2   6     2

And my function is like this而我的function是这样的

mu_fuc = function(df, condition) {
    workingdf = subset(df, condition < 3)

###pass working df to the other action
  } 

I am aware that for this first action, I can use conditional slicing as a work around.我知道对于第一个操作,我可以使用条件切片作为解决方法。 However, as I try to work on the other action, I realized I still have to refer to a column name in an dataframe for another existing function.但是,当我尝试执行其他操作时,我意识到我仍然必须为另一个现有的 function 引用 dataframe 中的列名。

I tried as.name(condition), but it did not work.我尝试了 as.name(condition),但它不起作用。

Thank you for your time.感谢您的时间。 Any suggestion is highly appreciated.任何建议都受到高度赞赏。

---------Update on 12/26----------- ---------12/26更新---------

The method using eval worked well for the function once, as below.使用 eval 的方法一次对 function 效果很好,如下所示。

meta = function(sub_data, y) {
  y <- eval(as.list(match.call())$y, sub_data)
  workingdf <- subset(sub_data, y != 999) ##this successfully grab the column named y in the dataframe## 
  
  meta1 <- metacor(y,  ##this successfully grab the column named y in the dataframe## 
                   n, 
                   data = workingdf,
                   studlab = workingdf$Author_year,
                   sm = "ZCOR",
                   method.tau = "SJ", 
                   comb.fixed = F)
  return(meta1)

But somehow the same approach did not work in the following code.但不知何故,相同的方法在以下代码中不起作用。

mod_analysis = function(meta, moderator){
  workingdf <- meta$data
  moderator <- eval(as.list(match.call())$moderator, workingdf) 
  
  output = metareg(meta, moderator)
  return(output)
}  

Then it was this error message:然后是这个错误信息:

Error in eval(predvars, data, env) : object 'moderator' not found 

I don't know why it worked for the first function but not the second.我不知道为什么它适用于第一个 function 而不是第二个。

One can convert the condition from a simple column reference to an expression, enabling the function argument to include the right hand side of an expression instead of hard coding it into the function.可以将condition从简单的列引用转换为表达式,使 function 参数能够包含表达式的右侧,而不是将其硬编码到 function 中。 This can be accomplished with a couple of functions from the rlang package, enquo() and eval_tidy() .这可以通过rlang package、 enquo()eval_tidy()中的几个函数来完成。

We'll illustrate this with a subsetting function and the mtcars data frame.我们将通过子集 function 和mtcars数据框来说明这一点。

aSubsetFunction <- function(df,condition){
   require(rlang)
   condition <- enquo(condition)
   rows_value <- eval_tidy(condition, df)
   stopifnot(is.logical(rows_value))
   df[rows_value, ,drop = FALSE]
}

The condition <- enquo(condition) line quotes the condition expression. condition <- enquo(condition)行引用条件表达式。 The eval_tidy() function evaluates the quoted expression, using df as a data mask. eval_tidy() function 使用df作为数据掩码计算引用的表达式。 The output from eval_tidy() , rows_value , is a vector of logical values (TRUE / FALSE), which we use on the row dimension of the input data frame with the [ form of the extract operator. eval_tidy()中的rows_value ,rows_value 是一个逻辑值向量 (TRUE / FALSE),我们在输入数据帧的行维度上使用提取运算符的[形式。 We use stopifnot() to generate an error if rows_value is not a vector of logical values.如果rows_value不是逻辑值向量,我们使用stopifnot()生成错误。

We call the function twice to illustrate that it works with multiple columns in the data frame.我们两次调用 function 来说明它适用于数据帧中的多个列。

aSubsetFunction(mtcars,mpg > 25)
aSubsetFunction(mtcars,carb > 4)

...and the output: ...和 output:

> aSubsetFunction(mtcars,mpg > 25)
Loading required package: rlang
                mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Fiat 128       32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
Honda Civic    30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
Toyota Corolla 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
Fiat X1-9      27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
Porsche 914-2  26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
Lotus Europa   30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
> aSubsetFunction(mtcars,carb > 4)
               mpg cyl disp  hp drat   wt qsec vs am gear carb
Ferrari Dino  19.7   6  145 175 3.62 2.77 15.5  0  1    5    6
Maserati Bora 15.0   8  301 335 3.54 3.57 14.6  0  1    5    8

Using data from the original post, the solution works as follows.使用原始帖子中的数据,该解决方案的工作原理如下。

df1 <- read.csv(text="ID,Val1
                5,3
                6,1")
df2 <- read.csv(text="ID,Val2
                5,9
                6,2")
aSubsetFunction(df1,Val1 < 3)
aSubsetFunction(df2,Val2 < 3)

...and the output: ...和 output:

> aSubsetFunction(df1,Val1 < 3)
  ID Val1
2  6    1
> aSubsetFunction(df2,Val2 < 3)
  ID Val2
2  6    2

Having illustrated the approach, we can use the order of object evaluation in R to simplify the function down to a single line of R code: Having illustrated the approach, we can use the order of object evaluation in R to simplify the function down to a single line of R code:

aSubsetFunction <- function(df,condition){
   require(rlang)
   df[eval_tidy(enquo(condition), df), ,drop = FALSE]
}

...which produces the same output as listed above. ...产生与上面列出的相同的 output。

> aSubsetFunction(mtcars,mpg > 25)
Loading required package: rlang
                mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Fiat 128       32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
Honda Civic    30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
Toyota Corolla 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
Fiat X1-9      27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
Porsche 914-2  26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
Lotus Europa   30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
> aSubsetFunction(mtcars,carb > 4)
               mpg cyl disp  hp drat   wt qsec vs am gear carb
Ferrari Dino  19.7   6  145 175 3.62 2.77 15.5  0  1    5    6
Maserati Bora 15.0   8  301 335 3.54 3.57 14.6  0  1    5    8
> 

Epilogue: why can't we use variable substitution with subset()?结语:为什么我们不能用子集()使用变量替换?

From an initial look at the question, one might expect that we could resolve the question with the following code.从最初看这个问题,人们可能会期望我们可以用下面的代码解决这个问题。

subset2 <- function(df,condition){
   subset(df,df[[condition]] > 4)
}

subset2(mtcars,carb)

However, this fails with an object not found error:但是,这会失败并出现 object not found 错误:

 Error in (function(x, i, exact) if (is.matrix(i)) as.matrix(x)[[i]] else .subset2(x,  : 
  object 'carb' not found 

Once againAdvanced R provides an explanation, directly quoting from the documentation for subset() . Advanced R再次提供了解释,直接引用了subset()的文档。

This is a convenience function intended for use interactively.这是一个方便的 function 用于交互使用。 For programming it is better to use the standard subsetting functions like [, and in particular the non-standard evaluation of argument subset can have unanticipated consequences.对于编程,最好使用像 [ 之类的标准子集函数,特别是参数子集的非标准评估可能会产生意想不到的后果。

Bottom line: it's important to understand Base R non-standard evaluation when writing functions to automate one's analysis because the assumptions coded into various R functions can produce unexpected results.底线:在编写函数以自动化分析时,了解基本 R 非标准评估非常重要,因为编码到各种 R 函数中的假设可能会产生意想不到的结果。 This is especially true of modeling functions like lm() that rely on formula() , as Wickham describes in Advanced R: wrapping modeling functions .正如 Wickham 在Advanced R: wrapping modeling functions中描述的那样,对于像lm()这样依赖于formula()的建模函数尤其如此。

References: Advanced R, Chapter 20 section 4 ,Chapter 20 section 6参考:高级R,第20章第4节第20章第6节

I guess you can try match.call with eval我想你可以试试match.calleval

mu_fuc <- function(df, condition) {
  condition <- eval(as.list(match.call())$condition, df)
  workingdf <- subset(df, condition < 3)
  workingdf
}

which enables这使得

> mu_fuc(df1, Var1)
  ID Var1
2  6    1

> mu_fuc(df2, Var2)
  ID Var2
2  6    2

Try this with indexing:尝试使用索引:

#Funtion
mu_fuc = function(df, condition) {
  workingdf <- df[df[[condition]]<3,]
  return(workingdf)
} 
#Apply
mu_fuc(df1,'Var1')

Output: Output:

mu_fuc(df1,'Var1')
  ID Var1
2  6    1

Some data used:使用的一些数据:

#Data
df1 <- structure(list(ID = 5:6, Var1 = c(3L, 1L)), class = "data.frame", row.names = c("1", 
"2"))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM