简体   繁体   中英

Passing variable name to a function in R

R newbie here. Struggling with passing a variable name to my own function. I use the subset command in my function because it is one of the steps, though I'd like to understand the broader logic of how to pass variable names, especially when using them as arguments within other functions (eg subset)

myf.subset <- function(data, xvar) {
  new.data <- subset(data, xvar == 0)
  return(new.data)
}
df <- data.frame(x = sample(c(0,1), size = 100, replace = TRUE))
myf.subset(df, xvar = x)

Which does not work, and returns Error in eval(e, x, parent.frame()) : object 'x' not found

I then tried myf.subset(df, xvar = "x") which returns an empty data frame. Other attempts were

myf.subset <- function(data, xvar) {
  new.data <- subset(data, eval(substitute(xvar)) == 0)
  return(new.data)
}
df <- data.frame(x = sample(c(0,1), size = 100, replace = TRUE))
myf.subset(df, xvar = "x")

which again returns an empty data frame

[1] x
<0 rows> (or 0-length row.names)

EDITED: The next step would be to run a regression on the subset so defined, in which case I would add more variables (now embedding @akron answer)

myf.subset <- function(data, xvar, yvar, zvar) {
  xvar <- deparse(substitute(xvar))
  yvar <- deparse(substitute(yvar))
  zvar <- deparse(substitute(zvar))
  # new.data <- subset(data, xvar == 0)
  new.data <- data[data[[xvar]] == 0, , drop = FALSE]
  
  OLS <- lm(data = new.data, yvar~zvar )
  return(OLS)
}
df <- data.frame(x = sample(c(0,1), size = 100, replace = TRUE),
                 y = sample(c(0,1), size = 100, replace = TRUE),
                 z = sample(c(0,1), size = 100, replace = TRUE))
myf.subset(df, xvar = x, yvar = y, zvar = z)

Use deparse/substitute to convert the unquoted argument to string and then use [[ to pull the column as a vector, create the logical vector and subset with [

myf.subset <- function(data, xvar) {
   xvar <- deparse(substitute(xvar))
 data[data[[xvar]] == 0, , drop = FALSE]
  }

-testing

> myf.subset(df, xvar = x)
   x
3  0
5  0
12 0
18 0
20 0
24 0
25 0
28 0
29 0
32 0
33 0
35 0
36 0
37 0
39 0
41 0
42 0
43 0
47 0
48 0
49 0
51 0
55 0
57 0
58 0
62 0
63 0
65 0
66 0
67 0
69 0
70 0
71 0
73 0
74 0
75 0
76 0
80 0
82 0
84 0
87 0
88 0
90 0
92 0
94 0
97 0
99 0

In the updated code, the formula can be created with reformulate or paste

myf.subset <- function(data, xvar, yvar, zvar) {
  xvar <- deparse(substitute(xvar))
  yvar <- deparse(substitute(yvar))
  zvar <- deparse(substitute(zvar))
  # new.data <- subset(data, xvar == 0)
  new.data <- data[data[[xvar]] == 0, , drop = FALSE]
  fmla <- reformulate(zvar, response = yvar)
  # fmla <- as.formula(paste(yvar, zvar, sep = ' ~ '))
  OLS <- lm(data = new.data, fmla )
  return(OLS)
}

-testing

> myf.subset(df, xvar = x, yvar = y, zvar = z)

Call:
lm(formula = fmla, data = new.data)

Coefficients:
(Intercept)            z  
    0.48000     -0.01333  

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM