I would like to subset a dataframe by referring to a column with a string and select the values of that column that fulfill a condition. From the following code
employee <- c('John Doe','Peter Gynn','Jolie Hope')
salary <- c(21000, 23400, 26800)
startdate <- as.Date(c('2010-11-1','2008-3-25','2007-3-14'))
employ.data <- data.frame(employee, salary, startdate)
salary_string <- "salary"
I want to get all salaries over 23000 by using the salary_string to refer to the column name.
I tried without succes:
set <- subset(employ.data, salary_string > 23000)
set2 <- employ.data[, employ.data$salary_string > 23000)
This does not seem to work because the salary_string is of type character but what I need is some sort of "column name object". Using as.name(salary_string) does not work neither. I know I could get the subset by using
set <- subset(employ.data, salary > 23000)
But my goal is to use the column name that is of type character (salary_string) once with subset(employ.data, ... ) and once with employ.data[, ...]
简短的回答是:不要使用subset
而是使用类似的东西
employ.data[employ.data[salary_string]>23000,]
Here's another idea:
dplyr::filter(employ.data, get(salary_string) > 23000)
Which gives:
# employee salary startdate
#1 Peter Gynn 23400 2008-03-25
#2 Jolie Hope 26800 2007-03-14
For the sake of showing how to achieve the result with subset()
:
The issue you're having is because subset()
uses non-standard evaluation. Here's one way to substitute your string into the subset()
function.
## set up an unevaluated call
e <- call(">", as.name(salary_string), 23000)
## evaluate it in subset()
subset(employ.data, eval(e))
# employee salary startdate
# 2 Peter Gynn 23400 2008-03-25
# 3 Jolie Hope 26800 2007-03-14
Or as Steven suggests, the following would also work well.
subset(employ.data, eval(as.name(salary_string)) > 23000)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.